Synonym(Graph)FilterFactory seems to ignore tokenizerFactory.* parameters.

2018-05-30 Thread Yasufumi Mizoguchi
Hi, community.

I want to use Synonym(Graph)Filter with JapaneseTokenizer and
NGramTokenizer.
But it turned out that Synonym(Graph)FilterFactory seemed to ignore
tokenizerFactory.* parameters such as "tokenizerFactory.maxGramSize",
"tokenizerFactory.userDictionary" etc... when using
managed-schema(ManagedIndexSchema class).

Is this a bug?

I found the similar issue in JIRA(
https://issues.apache.org/jira/browse/SOLR-10010) and also found that Solr
would respect the parameters when
"informResourceLoaderAwareObjectsForFieldType" method called in
"postReadInform" method is commented out as seen in the JIRA issue.
(
https://github.com/apache/lucene-solr/blob/a03d6bc8c27d3d97011bc5bdc2aeb94c4820628c/solr/core/src/java/org/apache/solr/schema/ManagedIndexSchema.java#L1153
)

Thanks,
Yasufumi


Can one set a short node name in Solr Cloud

2018-05-30 Thread Michael Schumann
We are running Solr Cloud version 7.2. The node names in ZooKeeper are very 
long: over 50 characters.  Is there a way to set a friendlier short name both 
for display purposes in the admin console and also to use when interacting with 
collections API?

Thank you, Michael


Re: Solr Cloud 7.3.1 backups

2018-05-30 Thread Greg Roodt
Thanks for the confirmation Shawn. Distributed systems are hard, so this
makes sense.

I have a large, stable cluster (stable in terms of leadership and
performance) with a single shard. The cluster scales up and down with
additional PULL replicas over the day with the traffic curve.

It's going to take a bit of coordination to get all nodes to mount a shared
volume when we take a backup and then unmount when done.

Any idea what happens if a node joins or leaves during a backup?









On Thu, 31 May 2018 at 06:14, Shawn Heisey  wrote:

> On 5/29/2018 3:01 PM, Greg Roodt wrote:
> > What is the best way to perform a backup of a Solr Cloud cluster? Is
> there
> > a way to backup only the leader? From my tests with the collections admin
> > BACKUP command, all nodes in the cluster need to have access to a shared
> > filesystem. Surely that isn't necessary if you are backing up the leader
> or
> > TLOG replica?
>
> If you have more than one Solr instance in your cloud, then all of those
> instances must have access to the same filesystem accessed from the same
> mount point.  Together, they will write the entire collection to various
> subdirectories in that location.
>
> I can't find any mention of whether backups are load balanced across the
> cloud, or if they always use leaders.  I would assume the former.  If
> that's how it works, then you don't know which machine is going to do
> the backup of a given shard.  Even if the backup always uses leaders,
> you can't always be sure of where a leader is.  It can change from
> moment to moment, especially if you're having stability problems with
> your cloud.
>
> At restore time, there's a similar situation.  You don't know which
> machine(s) in the cloud are going to be actually loading index data from
> the backup location.  So they all need to have access to the same data.
>
> Thanks,
> Shawn
>
>


RE: Solr5.4 - Indexing a big file (size = 2.4Go)

2018-05-30 Thread Bruno Mannina
Hi Erick,

I want to index this file because I received this file from my boss.

This file contains around 1.5M docs.

I think I will split this file and index them. 
It will be better.

Thanks

-Message d'origine-
De : Erick Erickson [mailto:erickerick...@gmail.com] 
Envoyé : mercredi 30 mai 2018 16:50
À : solr-user
Objet : Re: Solr5.4 - Indexing a big file (size = 2.4Go)

Why do you want to index a 2G file in the first place? You can't really do 
anything with it.

If you deliver it to a browser, the browser will churn forever. If you try to 
export it it'll suck up your bandwidth terribly.

If it's a bunch of individual docs (in Solr's xml format) about the only thing 
that makes sense is to break it up.

This sounds like an XY problem, you've asked how to do X (index a 2G
file) without telling us Y (what
the use-case is).

Best,
Erick

On Wed, May 30, 2018 at 7:18 AM, Bruno Mannina  
wrote:
> Dear Solr User,
>
>
>
> I got a invalid content length when I try to index my file (xml file 
> with a size of 2.4Go)
>
>
>
> I use simpleposttool like in the documentation on my ubuntu server.
>
>>bin/post -port 1234 -c mycollection /home/bruno/2013.xml
>
>
>
> It works with smaller file but not with this one. I suppose it's the size.
>
>
>
> Is exist a param to change to allow big file ?
>
>
>
> I change in the solrconfig the param formdatauploadlimitinkb to 4096 
> and miltipartuploadlimitinkb to 4096000 without successing.
>
>
>
> Do you have an idea ?
>
>
>
> Many thanks for your help,
>
>
>
> Best Regards
>
> Bruno
>
>
>
> ---
> L'absence de virus dans ce courrier électronique a été vérifiée par le 
> logiciel antivirus Avast.
> https://www.avast.com/antivirus



RE: No solr.log in solr cloud 7.3

2018-05-30 Thread Leonard, Carl
Does you solr.in.sh have a reference to your log4j.properties file with the 
proper path?

-Original Message-
From: Shawn Heisey  
Sent: Wednesday, May 30, 2018 1:15 PM
To: solr-user@lucene.apache.org
Subject: Re: No solr.log in solr cloud 7.3

On 5/30/2018 8:40 AM, msaunier wrote:
> Today, I don’t understand why, but I don’t have solr.log file. I have just:
>
> drwxr-xr-x 1 solr solr 84 mai   30 16:19 archived
>
> -rw-r--r-- 1 solr solr 891352 mai   30 16:29 solr-8983-console.log
>
> -rw-r--r-- 1 solr solr  74068 mai   30 16:34 solr_gc.log.0.current


What procedure did you follow to install Solr?  How was it started? What 
version of Solr?  What OS flavor are you running on?  The answers to these 
questions will help determine where you should be looking.

Thanks,
Shawn



Re: No solr.log in solr cloud 7.3

2018-05-30 Thread Shawn Heisey
On 5/30/2018 8:40 AM, msaunier wrote:
> Today, I don’t understand why, but I don’t have solr.log file. I have just:
>
> drwxr-xr-x 1 solr solr 84 mai   30 16:19 archived
>
> -rw-r--r-- 1 solr solr 891352 mai   30 16:29 solr-8983-console.log
>
> -rw-r--r-- 1 solr solr  74068 mai   30 16:34 solr_gc.log.0.current


What procedure did you follow to install Solr?  How was it started? 
What version of Solr?  What OS flavor are you running on?  The answers
to these questions will help determine where you should be looking.

Thanks,
Shawn



Re: Solr Cloud 7.3.1 backups

2018-05-30 Thread Shawn Heisey
On 5/29/2018 3:01 PM, Greg Roodt wrote:
> What is the best way to perform a backup of a Solr Cloud cluster? Is there
> a way to backup only the leader? From my tests with the collections admin
> BACKUP command, all nodes in the cluster need to have access to a shared
> filesystem. Surely that isn't necessary if you are backing up the leader or
> TLOG replica?

If you have more than one Solr instance in your cloud, then all of those
instances must have access to the same filesystem accessed from the same
mount point.  Together, they will write the entire collection to various
subdirectories in that location.

I can't find any mention of whether backups are load balanced across the
cloud, or if they always use leaders.  I would assume the former.  If
that's how it works, then you don't know which machine is going to do
the backup of a given shard.  Even if the backup always uses leaders,
you can't always be sure of where a leader is.  It can change from
moment to moment, especially if you're having stability problems with
your cloud.

At restore time, there's a similar situation.  You don't know which
machine(s) in the cloud are going to be actually loading index data from
the backup location.  So they all need to have access to the same data.

Thanks,
Shawn



Re: Find value in Parent doc fields OR Child doc fields

2018-05-30 Thread Mikhail Khludnev
 Project_Title:QWE {!parent which=path:1.Project v='Submission_No:QWE'}

fixing the quote

On Wed, May 30, 2018 at 4:01 AM, kristaclaire14 
wrote:

> Hi,
>
> I want to query/find a value that may match on parent document fields or
> child document fields. Is this possible using block join parent query
> parser? How can I do this with solr nested documents? Here is the example
> data:
>
> [{
> "id":"1001"
> "path":"1.Project",
> "Project_Title":"Sample Project",
> "_childDocuments_":[
> {
> "id":"2001",
> "path":"2.Project.Submission",
> "Submission_No":"1234-QWE",
> "_childDocuments_":[
> {
> "id":"3001",
> "path":"3.Project.Submission.Agency",
> "Agency_Cd":"QWE"
> }
> ]
> }]
> }, {
> "id":"1002"
> "path":"1.Project",
> "Project_Title":"Test Project QWE",
> "_childDocuments_":[
> {
> "id":"2002",
> "path":"2.Project.Submission",
> "Submission_No":"4567-AGY",
> "_childDocuments_":[
> {
> "id":"3002",
> "path":"3.Project.Submission.Agency",
> "Agency_Cd":"AGY"
> }]
> },{
> "id":"2003",
> "path":"2.Project.Submission",
> "Submission_No":"7891-QWE",
> "_childDocuments_":[
> {
> "id":"3003",
> "path":"3.Project.Submission.Agency",
> "Agency_Cd":"QWE"
> }]
> }]
> }]
>
> I want to retrieve all Projects with Project_Title:*QWE* OR
> Submission_Submission_No:*QWE*. Thanks in advance.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>



-- 
Sincerely yours
Mikhail Khludnev


Re: Find value in Parent doc fields OR Child doc fields

2018-05-30 Thread Mikhail Khludnev
q=Project_Title:QWE {!parent which=path:1.Project v='Submission_No:QWE}

On Wed, May 30, 2018 at 4:01 AM, kristaclaire14 
wrote:

> Hi,
>
> I want to query/find a value that may match on parent document fields or
> child document fields. Is this possible using block join parent query
> parser? How can I do this with solr nested documents? Here is the example
> data:
>
> [{
> "id":"1001"
> "path":"1.Project",
> "Project_Title":"Sample Project",
> "_childDocuments_":[
> {
> "id":"2001",
> "path":"2.Project.Submission",
> "Submission_No":"1234-QWE",
> "_childDocuments_":[
> {
> "id":"3001",
> "path":"3.Project.Submission.Agency",
> "Agency_Cd":"QWE"
> }
> ]
> }]
> }, {
> "id":"1002"
> "path":"1.Project",
> "Project_Title":"Test Project QWE",
> "_childDocuments_":[
> {
> "id":"2002",
> "path":"2.Project.Submission",
> "Submission_No":"4567-AGY",
> "_childDocuments_":[
> {
> "id":"3002",
> "path":"3.Project.Submission.Agency",
> "Agency_Cd":"AGY"
> }]
> },{
> "id":"2003",
> "path":"2.Project.Submission",
> "Submission_No":"7891-QWE",
> "_childDocuments_":[
> {
> "id":"3003",
> "path":"3.Project.Submission.Agency",
> "Agency_Cd":"QWE"
> }]
> }]
> }]
>
> I want to retrieve all Projects with Project_Title:*QWE* OR
> Submission_Submission_No:*QWE*. Thanks in advance.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>



-- 
Sincerely yours
Mikhail Khludnev


Re: CURL command problem on Solr

2018-05-30 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Roee,

On 5/30/18 3:38 AM, Roee T wrote:
> Thank you so much all of you the following worked for me!
> 
> curl -X PUT -H "Content-Type: application/json" -d
> "@Myfeatures.json" 
> "http://localhost:8983/solr/techproducts/schema/feature-store";

Curl assumes that the URL is the last argument to the program, so it
stops reading options (left-to-right) when it gets to the URL.

So if you put options after the URL they will be ignored.

- -chris
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlsOzDIACgkQHPApP6U8
pFjHjhAAjwI0xMzqK1Pzhogiumq6XVNLr8BqqlL2jMMXcf3EcwOx5WT62oqDFs92
JILArQSPp22GUOvR3cTxlmTVAYsjvMqsvkbVugxeU5VtBBz1VVwy3bU220nKlooo
El9T0292wbuP/QbUGdt0qfnpMkXIfbdwKJhd7MQ30J8S7XxvQx8j5YQhe2MAkPlz
x7Bc4Qy2J6ov5wNq2sd4wuj5XvvjDE+8pFDXWtC6m7mfjsbGrHTAIoTI843GAVRz
RkMd08vzsmoS81cNsaQAqxJCX0tP2Hwbx0asH94ZO0ohlHe8dB5hmk1fS0TDgNae
QR4hczJ3lYQCpvZXYFCUihC/7Sfpe3/yjs/Ke2DlbUtXJHaLulSYoo7RrgTl3JZy
zBne6HNtcruvQAqDIjKq8xcAzszLsxVPA4RGqO/J5uY96hyuUe/NuJUeUdRTkIbU
wC+DYs8ch7PeOMGkW1MYWSeakPRdQ1/5EKS1mtubJNBVOCri+hy4I+KT5V1f9y8x
8GIySXaoH52xt3b/hsJajQ2PdHd4KRGgB1H7mx9ntXsoVzmPSanuxQ6w+E/XUHDt
iyl2WheLtUop+ukE7ahGUe+IPEVqTMXtdiQBCDB0IWyGbsB00M5P9ZUeFbOCCfle
B0N3Jafv7hGjLHzfjpu3lAUneS3ct2Ljy4Za2snW/ZgMzezHUUY=
=xZzo
-END PGP SIGNATURE-


RE: Solr5.4 - Indexing a big file (size = 2.4Go)

2018-05-30 Thread Leonard, Carl
Is it one document that is 2.4 GB or is that 2.4GB several documents?

There are some limits in solrconfig.xml.  Perhaps you are hitting the 
multipartUploadLimitInKB?




-Original Message-
From: Erick Erickson  
Sent: Wednesday, May 30, 2018 7:50 AM
To: solr-user 
Subject: Re: Solr5.4 - Indexing a big file (size = 2.4Go)

Why do you want to index a 2G file in the first place? You can't really do 
anything with it.

If you deliver it to a browser, the browser will churn forever. If you try to 
export it it'll suck up your bandwidth terribly.

If it's a bunch of individual docs (in Solr's xml format) about the only thing 
that makes sense is to break it up.

This sounds like an XY problem, you've asked how to do X (index a 2G
file) without telling us Y (what
the use-case is).

Best,
Erick

On Wed, May 30, 2018 at 7:18 AM, Bruno Mannina  
wrote:
> Dear Solr User,
>
>
>
> I got a invalid content length when I try to index my file (xml file 
> with a size of 2.4Go)
>
>
>
> I use simpleposttool like in the documentation on my ubuntu server.
>
>>bin/post -port 1234 -c mycollection /home/bruno/2013.xml
>
>
>
> It works with smaller file but not with this one. I suppose it's the size.
>
>
>
> Is exist a param to change to allow big file ?
>
>
>
> I change in the solrconfig the param formdatauploadlimitinkb to 4096 
> and miltipartuploadlimitinkb to 4096000 without successing.
>
>
>
> Do you have an idea ?
>
>
>
> Many thanks for your help,
>
>
>
> Best Regards
>
> Bruno
>
>
>
> ---
> L'absence de virus dans ce courrier électronique a été vérifiée par le 
> logiciel antivirus Avast.
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.a
> vast.com%2Fantivirus&data=02%7C01%7CCLeonard%40whisolutions.com%7C2546
> 89a9ef634c7f3cc708d5c63cc4e9%7C46326bff992841a0baca17c16c94ea99%7C0%7C
> 0%7C636632886654271771&sdata=8FiKfTYaUvx29ihtoHHgRriVr6%2Bb5SHx%2F6fx4
> BwQAGI%3D&reserved=0


CURL DELETE BLOB do not working in solr 7.3 cloud

2018-05-30 Thread msaunier
Hello,

 

I want to delete a file in the blob but this command not work:

curl -X "DELETE"
http://srv-formation-solr3:8983/solr/.system/blob/CityaUpdateProcessorJar/14

 

This command return just the file informations:

{

  "responseHeader":{

"zkConnected":true,

"status":0,

"QTime":1},

  "response":{"numFound":1,"start":0,"docs":[

  {

"id":"CityaUpdateProcessorJar/14",

"md5":"45aeda5a01607fb668cec26a45cac9e6",

"blobName":"CityaUpdateProcessorJar",

"version":14,

"timestamp":"2018-05-30T12:59:40.419Z",

"size":22483}]

  }}

 

My command is bad ? 

Thanks,

Maxence,



Re: Find value in Parent doc fields OR Child doc fields

2018-05-30 Thread Erick Erickson
Asher:

Please follow the instructions here:
http://lucene.apache.org/solr/community.html#mailing-lists-irc. You
must use the _exact_ same e-mail as you used to subscribe.

If the initial try doesn't work and following the suggestions at the
"problems" link doesn't work for you, let us know. But note you need
to show us the _entire_ return header to allow anyone to diagnose the
problem.

Best,
Erick

On Wed, May 30, 2018 at 12:05 AM, Aniket Khare  wrote:
> Please refer below link and check [subquery]
> .
>
> https://lucene.apache.org/solr/guide/6_6/transforming-result-documents.html
>
> On Wed, May 30, 2018 at 6:31 AM, kristaclaire14 
> wrote:
>
>> Hi,
>>
>> I want to query/find a value that may match on parent document fields or
>> child document fields. Is this possible using block join parent query
>> parser? How can I do this with solr nested documents? Here is the example
>> data:
>>
>> [{
>> "id":"1001"
>> "path":"1.Project",
>> "Project_Title":"Sample Project",
>> "_childDocuments_":[
>> {
>> "id":"2001",
>> "path":"2.Project.Submission",
>> "Submission_No":"1234-QWE",
>> "_childDocuments_":[
>> {
>> "id":"3001",
>> "path":"3.Project.Submission.Agency",
>> "Agency_Cd":"QWE"
>> }
>> ]
>> }]
>> }, {
>> "id":"1002"
>> "path":"1.Project",
>> "Project_Title":"Test Project QWE",
>> "_childDocuments_":[
>> {
>> "id":"2002",
>> "path":"2.Project.Submission",
>> "Submission_No":"4567-AGY",
>> "_childDocuments_":[
>> {
>> "id":"3002",
>> "path":"3.Project.Submission.Agency",
>> "Agency_Cd":"AGY"
>> }]
>> },{
>> "id":"2003",
>> "path":"2.Project.Submission",
>> "Submission_No":"7891-QWE",
>> "_childDocuments_":[
>> {
>> "id":"3003",
>> "path":"3.Project.Submission.Agency",
>> "Agency_Cd":"QWE"
>> }]
>> }]
>> }]
>>
>> I want to retrieve all Projects with Project_Title:*QWE* OR
>> Submission_Submission_No:*QWE*. Thanks in advance.
>>
>>
>>
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>
>
>
>
> --
> Regards,
>
> Aniket S. Khare


Re: Solr5.4 - Indexing a big file (size = 2.4Go)

2018-05-30 Thread Erick Erickson
Why do you want to index a 2G file in the first place? You can't
really do anything with it.

If you deliver it to a browser, the browser will churn forever. If you
try to export it it'll suck up
your bandwidth terribly.

If it's a bunch of individual docs (in Solr's xml format) about the
only thing that makes sense is to break it up.

This sounds like an XY problem, you've asked how to do X (index a 2G
file) without telling us Y (what
the use-case is).

Best,
Erick

On Wed, May 30, 2018 at 7:18 AM, Bruno Mannina
 wrote:
> Dear Solr User,
>
>
>
> I got a invalid content length when I try to index my file (xml file with a
> size of 2.4Go)
>
>
>
> I use simpleposttool like in the documentation on my ubuntu server.
>
>>bin/post -port 1234 -c mycollection /home/bruno/2013.xml
>
>
>
> It works with smaller file but not with this one. I suppose it's the size.
>
>
>
> Is exist a param to change to allow big file ?
>
>
>
> I change in the solrconfig the param formdatauploadlimitinkb to 4096 and
> miltipartuploadlimitinkb to 4096000 without successing.
>
>
>
> Do you have an idea ?
>
>
>
> Many thanks for your help,
>
>
>
> Best Regards
>
> Bruno
>
>
>
> ---
> L'absence de virus dans ce courrier électronique a été vérifiée par le 
> logiciel antivirus Avast.
> https://www.avast.com/antivirus


Re: CURL command problem on Solr

2018-05-30 Thread Roee T
Thank you so much all of you
the following worked for me!

curl -X PUT -H "Content-Type: application/json" -d "@Myfeatures.json"
"http://localhost:8983/solr/techproducts/schema/feature-store";






--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Solr5.4 - Indexing a big file (size = 2.4Go)

2018-05-30 Thread Bruno Mannina
Dear Solr User,



I got a invalid content length when I try to index my file (xml file with a
size of 2.4Go)



I use simpleposttool like in the documentation on my ubuntu server.

>bin/post -port 1234 -c mycollection /home/bruno/2013.xml



It works with smaller file but not with this one. I suppose it's the size.



Is exist a param to change to allow big file ?



I change in the solrconfig the param formdatauploadlimitinkb to 4096 and
miltipartuploadlimitinkb to 4096000 without successing.



Do you have an idea ?



Many thanks for your help,



Best Regards

Bruno



---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus


No solr.log in solr cloud 7.3

2018-05-30 Thread msaunier
Hello,

 

Today, I don’t understand why, but I don’t have solr.log file. I have just:

 

drwxr-xr-x 1 solr solr 84 mai   30 16:19 archived

-rw-r--r-- 1 solr solr 891352 mai   30 16:29 solr-8983-console.log

-rw-r--r-- 1 solr solr  74068 mai   30 16:34 solr_gc.log.0.current

 

My log4j.properties:

 

# Default Solr log4j config

# rootLogger log level may be programmatically overridden by
-Dsolr.log.level

solr.log=${solr.log.dir}

log4j.rootLogger=INFO, file, CONSOLE

 

# Console appender will be programmatically disabled when Solr is started
with option -Dsolr.log.muteconsole

log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender

log4j.appender.CONSOLE.layout=org.apache.log4j.EnhancedPatternLayout

log4j.appender.CONSOLE.layout.ConversionPattern=%d{-MM-dd HH:mm:ss.SSS}
%-5p (%t) [%X{collection} %X{shard} %X{replica} %X{core}] %c{1.} %m%n

 

#- size rotation with log cleanup.

log4j.appender.file=org.apache.log4j.RollingFileAppender

log4j.appender.file.MaxFileSize=32MB

log4j.appender.file.MaxBackupIndex=10

 

#- File to log to and log format

log4j.appender.file.File=${solr.log}/solr.log

log4j.appender.file.layout=org.apache.log4j.EnhancedPatternLayout

log4j.appender.file.layout.ConversionPattern=%d{-MM-dd HH:mm:ss.SSS}
%-5p (%t) [%X{collection} %X{shard} %X{replica} %X{core}] %c{1.} %m%n

 

# Adjust logging levels that should differ from root logger

log4j.logger.org.apache.zookeeper=WARN

log4j.logger.org.apache.hadoop=WARN

log4j.logger.org.eclipse.jetty=WARN

log4j.logger.org.eclipse.jetty.server.Server=INFO

log4j.logger.org.eclipse.jetty.server.ServerConnector=INFO

 

# set to INFO to enable infostream log messages

log4j.logger.org.apache.solr.update.LoggingInfoStream=OFF

 

# début ajouts par FH

#log4j.appender.CONSOLE.threshold=debug

#log4j.logger.com.citya=ALL

# fin ajouts par FH

 

Thanks,

Maxence,



Re: Weird behavioural differences between pf in dismax and edismax

2018-05-30 Thread Alessandro Benedetti
Question in general for the community :
what is the dismax capable of doing that the edismax is not ?
Is it really necessary to keep both of them or the dismax could be
deprecated ?

Cheers



-
---
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Disadvantages of having Zookeeper instance and Solr instance in the same server

2018-05-30 Thread Shawn Heisey

On 5/29/2018 11:27 PM, solr2020 wrote:

What is the pros and cons of having Zookeeper instance and Solr instance in
the same VM/Server in production environment?


If you have sufficient CPU, memory, and I/O resources on the system for 
both roles, there is no problem with putting both on the same server.  
If possible, the ZK database should be on a separate physical disk than 
Solr uses for indexes, but if index is not handling a high loa, even 
that may not be necessary.


The major recommendation is that the ZK server should not be embedded in 
Solr.  It should be a separate process, so that it doesn't go down when 
Solr is stopped.


Putting ZK and Solr on completely separate physical servers does give 
the best results, but for many setups, isn't strictly required.


Thanks,
Shawn



Re: Impact/Performance of maxDistErr

2018-05-30 Thread David Smiley
I suggest using the "Intersects" spatial predicate when either the data is
all points or if the query is a point.  It's semantically equivalent and
the algorithm is much faster.

On Wed, May 30, 2018 at 3:25 AM Jens Viebig  wrote:

> Thanks for the detailed answer David, that helps a lot to understand!
> Best Regards
>
> Jens
>
> P.S. Currently the only search we are doing on the polygon is
> Contains(POINT(x,y))
>
>
> Am 29.05.2018 um 13:30 schrieb David Smiley:
>
> Hello Jens,
> With solr.RptWithGeometrySpatialField, you always get an accurate result
> thanks to the "WithGeometry" part.  The "Rpt" part is a grid index, and
> most of the parameters pertain to that.  maxDistErr controls the highest
> resolution grid.  No shape will be indexed to higher resolutions than this,
> though may be courser resolutions dependent on distErrPct.  The
> configuration you chose initially (that turned out to be slow for you) was
> a meter, and then you changed it to a kilometer and got fast indexing
> results.  I figure the size of your indexed shapes are on average a
> kilometer in size (give or take an order of magnitude).  It's hard to guess
> how your query shapes compare to your indexed shapes as there are multiple
> possibilities that could yield similar query performance when changing
> maxDistErr so much.
>
> The bottom line is that you should dial up maxDistErr as much as you can
> get away with it -- which is as long as query performance is good.  So you
> did the right thing :-).  That number will probably be a distance somewhat
> less than the average indexed shape diameter, or average query shape
> diameter, whichever is greater.  Perhaps 1/10th smaller; if I had to pick.
> The default setting, I think a meter, is probably not a good default for
> this field type.
>
> Note you could also try increasing distErrPct some, maybe to as much as
> .25, though I wouldn't go much higher., as it may yield gridded shapes that
> are so course as to not have interior cells.  Depending on what your query
> shapes typically look like and indexed shapes relative to each other, that
> may be significant or may not be.  If the indexed shapes are often much
> larger than your query shape then it's significant.
>
> ~ David
>
> On Fri, May 25, 2018 at 6:59 AM Jens Viebig  wrote:
>
>> Hello,
>>
>> we are indexing a polygon with 4 points (non-rectangular, field-of-view
>> of a camera) in a RptWithGeometrySpatialField alongside some more fields,
>> to perform searches that check if a point is within this polygon
>>
>> We started using the default configuration found in several examples
>> online:
>>
>> >
>> spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFactory"
>>geo="true" distErrPct="0.15" maxDistErr="0.001"
>> distanceUnits="kilometers" />
>>
>> We discovered that with this setting the indexing (soft commit) speed is
>> very slow
>> For 1 documents it takes several minutes to finish the commit
>>
>> If we disable this field, indexing+soft commit is only 3 seconds for
>> 1 docs,
>> if we set maxDistErr to 1, indexing speed is at around 5 seconds, so a
>> huge performance gain against the several minutes we had before
>>
>> I tried to find out via the documentation whats the impact of
>> "maxDistErr" on search results but didn't quite find an in-depth explanation
>> From our tests we did, the search results still seem to be very accurate
>> even if the covered space of the polygon is less then 1km and search speed
>> did not suffer.
>>
>> So i would love to learn more about the differences on having
>> maxDistErr="0.001" vs maxDistErr="1" on a RptWithGeometrySpatialField and
>> what problems could we run into with the bigger value
>>
>> Thanks
>> Jens
>>
>>
>>
>>
>> *Jens Viebig*
>>
>> Software Development
>>
>> MAM Products
>>
>>
>> T. +49-(0)4307-8358-0 <+49%204307%2083580>
>>
>> E. jens.vie...@vitec.com
>>
>> *http://www.vitec.com *
>>
>>
>>
>> [image: VITEC_logo_for_email_signature]
>>
>>
>>
>> --
>>
>> VITEC GmbH, 24223 Schwentinental
>>
>> Geschäftsführer/Managing Director: Philippe Wetzel
>> HRB Plön 1584 / Steuernummer: 1929705211 / VATnumber: DE134878603
>>
>>
>>
> --
> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> http://www.solrenterprisesearchserver.com
>
>
> --
>
>
> *Jens Viebig*
>
> Software Development
>
> MAM Products
>
>
> T. +49-(0)4307-8358-0 <+49%204307%2083580>
>
> E. jens.vie...@vitec.com
>
> *http://www.vitec.com *
>
>
>
> [image: VITEC_logo_for_email_signature]
>
>
>
> --
>
> VITEC GmbH, 24223 Schwentinental
>
> Geschäftsführer/Managing Director: Philippe Wetzel
> HRB Plön 1584 / Steuernummer: 1929705211 / VATnumber: DE134878603
>
>
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com


Re: solr-extracting features values

2018-05-30 Thread Alessandro Benedetti
The current feature extraction implementation in Solr is oriented to the
Learning To Rank re-ranking capability, it is not built for feature
extraction ( to then train your model).

I am afraid you will need to implement your own system, that does multiple
queries to Solr with the extraction feature enabled and then parse the
results to build your training set.
Do you have query level or query dependant features ?
In case you are lucky enough to just have document level features, you may
end up in a slightly simplified scenario.

Cheers



-
---
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Impact/Performance of maxDistErr

2018-05-30 Thread Jens Viebig

Thanks for the detailed answer David, that helps a lot to understand!

Best Regards
Jens

P.S. Currently the only search we are doing on the polygon is 
Contains(POINT(x,y))


Am 29.05.2018 um 13:30 schrieb David Smiley:

Hello Jens,
With solr.RptWithGeometrySpatialField, you always get an accurate 
result thanks to the "WithGeometry" part.  The "Rpt" part is a grid 
index, and most of the parameters pertain to that.  maxDistErr 
controls the highest resolution grid.  No shape will be indexed to 
higher resolutions than this, though may be courser resolutions 
dependent on distErrPct.  The configuration you chose initially (that 
turned out to be slow for you) was a meter, and then you changed it to 
a kilometer and got fast indexing results.  I figure the size of your 
indexed shapes are on average a kilometer in size (give or take an 
order of magnitude).  It's hard to guess how your query shapes compare 
to your indexed shapes as there are multiple possibilities that could 
yield similar query performance when changing maxDistErr so much.


The bottom line is that you should dial up maxDistErr as much as you 
can get away with it -- which is as long as query performance is good. 
So you did the right thing :-).  That number will probably be a 
distance somewhat less than the average indexed shape diameter, or 
average query shape diameter, whichever is greater.  Perhaps 1/10th 
smaller; if I had to pick.  The default setting, I think a meter, is 
probably not a good default for this field type.


Note you could also try increasing distErrPct some, maybe to as much 
as .25, though I wouldn't go much higher., as it may yield gridded 
shapes that are so course as to not have interior cells.  Depending on 
what your query shapes typically look like and indexed shapes relative 
to each other, that may be significant or may not be.  If the indexed 
shapes are often much larger than your query shape then it's significant.


~ David

On Fri, May 25, 2018 at 6:59 AM Jens Viebig > wrote:


Hello,

we are indexing a polygon with 4 points (non-rectangular,
field-of-view of a camera) in a RptWithGeometrySpatialField
alongside some more fields, to perform searches that check if a
point is within this polygon

We started using the default configuration found in several
examples online:



We discovered that with this setting the indexing (soft commit)
speed is very slow
For 1 documents it takes several minutes to finish the commit

If we disable this field, indexing+soft commit is only 3 seconds
for 1 docs,
if we set maxDistErr to 1, indexing speed is at around 5 seconds,
so a huge performance gain against the several minutes we had before

I tried to find out via the documentation whats the impact of
"maxDistErr" on search results but didn't quite find an in-depth
explanation
From our tests we did, the search results still seem to be very
accurate even if the covered space of the polygon is less then 1km
and search speed did not suffer.

So i would love to learn more about the differences on having
maxDistErr="0.001" vs maxDistErr="1" on a
RptWithGeometrySpatialField and what problems could we run into
with the bigger value

Thanks
Jens

***



*

*Jens Viebig***

Software Development

MAM Products


T. +49-(0)4307-8358-0 

E. jens.vie...@vitec.com 

_http://www.vitec.com_

__

VITEC_logo_for_email_signature__

-- 


VITEC GmbH, 24223 Schwentinental

Geschäftsführer/Managing Director: Philippe Wetzel
HRB Plön 1584 / Steuernummer: 1929705211 / VATnumber: DE134878603

--
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book: 
http://www.solrenterprisesearchserver.com


--
Fwd: so, nun aber...
***

*

*Jens Viebig***

Software Development

MAM Products


T. +49-(0)4307-8358-0

E. jens.vie...@vitec.com

_http://www.vitec.com_

__

VITEC_logo_for_email_signature__

--

VITEC GmbH, 24223 Schwentinental

Geschäftsführer/Managing Director: Philippe Wetzel
HRB Plön 1584 / Steuernummer: 1929705211 / VATnumber: DE134878603



Re: Find value in Parent doc fields OR Child doc fields

2018-05-30 Thread Aniket Khare
Please refer below link and check [subquery]
.

https://lucene.apache.org/solr/guide/6_6/transforming-result-documents.html

On Wed, May 30, 2018 at 6:31 AM, kristaclaire14 
wrote:

> Hi,
>
> I want to query/find a value that may match on parent document fields or
> child document fields. Is this possible using block join parent query
> parser? How can I do this with solr nested documents? Here is the example
> data:
>
> [{
> "id":"1001"
> "path":"1.Project",
> "Project_Title":"Sample Project",
> "_childDocuments_":[
> {
> "id":"2001",
> "path":"2.Project.Submission",
> "Submission_No":"1234-QWE",
> "_childDocuments_":[
> {
> "id":"3001",
> "path":"3.Project.Submission.Agency",
> "Agency_Cd":"QWE"
> }
> ]
> }]
> }, {
> "id":"1002"
> "path":"1.Project",
> "Project_Title":"Test Project QWE",
> "_childDocuments_":[
> {
> "id":"2002",
> "path":"2.Project.Submission",
> "Submission_No":"4567-AGY",
> "_childDocuments_":[
> {
> "id":"3002",
> "path":"3.Project.Submission.Agency",
> "Agency_Cd":"AGY"
> }]
> },{
> "id":"2003",
> "path":"2.Project.Submission",
> "Submission_No":"7891-QWE",
> "_childDocuments_":[
> {
> "id":"3003",
> "path":"3.Project.Submission.Agency",
> "Agency_Cd":"QWE"
> }]
> }]
> }]
>
> I want to retrieve all Projects with Project_Title:*QWE* OR
> Submission_Submission_No:*QWE*. Thanks in advance.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>



-- 
Regards,

Aniket S. Khare


Re: Disadvantages of having Zookeeper instance and Solr instance in the same server

2018-05-30 Thread Aniket Khare
Hi,

Solr and zookeeper both require a lot of I/O operations and in this case,
there will an issue where solr is delaying the zookeeper write operation.
So, it is recommended to have external zookeeper ensemble.
Please refer below link for more details.

"ZooKeeper's design assumes that it has extremely fast access to its
database. If the ZooKeeper database is stored on the same disks that hold
the Solr data, any performance problems with Solr will delay ZooKeeper's
access to its own database. This can lead to a performance death spiral
where each ZK timeout results in recovery operations which cause further
timeouts.

ZooKeeper holds its database in Java heap memory, so disk read performance
isn't quite as critical as disk write performance. In situations where the
OS disk cache is too small for Solr's needs and the ZK database is on the
same disk as Solr data, a large amount of disk access for Solr can
interfere with ZK writes. Using very fast disks for ZK (SSD in particular)
will result in good performance. Using separate physical disks for Solr and
ZK data is strongly recommended. Having dedicated machines for all ZK nodes
(a minimum of three nodes are required for redundancy) is even better, but
not strictly a requirement."

https://wiki.apache.org/solr/SolrPerformanceProblems

Regards,

Aniket S. Khare


On Wed, May 30, 2018 at 10:57 AM, solr2020  wrote:

> Hi,
>
> What is the pros and cons of having Zookeeper instance and Solr instance in
> the same VM/Server in production environment?
>
> Thanks.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>



-- 
Regards,

Aniket S. Khare