Re: how to get rid of double quotes in solr

2020-04-13 Thread sefty nindyastuti
the data that I use is log from hadoop, my problem is hadoop log from
cluster,
the schema I use is filebeat --> logstash --> solr, I use logstash config
to parse the hadoop log, the hadoop log is inputted to the logstash via
filebeat then output from the logstash indexed to the solr

Pada tanggal Sen, 13 Apr 2020 pukul 19.07 Erick Erickson <
erickerick...@gmail.com> menulis:

> I don’t quite know what you’re asking about. Is that input or intput to
> Solr? Or is it output from logstash?
>
> What are you indexing? Because that doesn't look like data from a solr log.
>
> You might want to review: https://wiki.apache.org/solr/UsingMailingLists
>
> Best,
> Erick
>
> > On Apr 13, 2020, at 12:24 AM, sefty nindyastuti 
> wrote:
> >
> > I have a problem when indexing log data clusters in solr using logstash
> and filebeat. there are double quotes in the solr index results,
> > how to solve this problem, please help
> >
> > expect the results of the index that appears in solr as below:
> >
> >  {
> > "input": "log"
> > "hostname": "localhost"
> > "id": "22eddbc9-e60f-29cd-a352-b40154ba1736",
> > "type": "filebeat"
> > "ephemeral_id": "1a31d6e0-8ed9-1307-215f-5dfd361364c9"
> > "version": "7.6.1"
> > "offset": "2061794 "
> > "path": " /var/log/hadoop/hdfs/hadoop-hdfs-secondarynamenode-xx.log "
> > "host": "localhostxxx",
> > "message": "2020-04-11 19: 04: 28,575 INFO common.Util
> (Util.java:receiveFile(314)) - Combined time for file downloads and fsync
> to all disks stores 0.02s. The file download stores 0.02s at 58750.00 KB /
> s Synchronous (fsync) write to disk of / hadoop / hdfs / namesecondary /
> current / edits_tmp_ "
> > {
> >
>
>


Re: Fuzzy search not working

2020-04-13 Thread Deepu
Corrected Typo mistake.

Hi Team,

We have 8 text fields (*_txt_en) in schema and one multi valued text field
which is copy field of other text fields, like below.

tittle_txt_en, configuration_summary_txt_en, all_text_txt_ens (multi value
field)

Observed one issue with Fuzzy match, same term with distance of two(~2) is
working on individual fields but not returning any results from multi
valued field.

Term we used is "probl" and document has "problem" term in two text fields,
so all_text field has two occurrences of 'problem" terms.



title_txt_en:probl~2. (given results)

all_text_txt_ens:probl~2 (no results)



is there any other factors involved in distance calculation other
than Damerau-Levenshtein Distance algoritham?

what might be the reason same input with same distance worked with one
field and failed with other field in same collection?

is there a way we can get actual distance solr calculated w.r.t specific
document and specific field ?



Thanks in advance !!


Thanks,

Pradeep

On Mon, Apr 13, 2020 at 2:35 PM Deepu  wrote:

> Hi Team,
>
> We have 8 text fields (*_txt_en) in schema and one multi valued text field
> which is copy field of other text fields, like below.
>
> tittle_txt_en, configuration_summary_txt_en, all_text_txt_ens (multi value
> field)
>
> Observed one issue with Fuzzy match, same term with distance of two(~2) is
> working on individual fields but not returning any results from multi
> valued field.
>
> Term we used is "prob" and document has "problem" term in two text fields,
> so all_text field has two occurrences of 'problem" terms.
>
>
>
> title_txt_en:prob~2. (given results)
>
> all_text_txt_ens:prob~2 (no results)
>
>
>
> is there any other factors involved in distance calculation other
> than Damerau-Levenshtein Distance algoritham?
>
> what might be the reason same input with same distance worked with one
> field and failed with other field in same collection?
>
> is there a way we can get actual distance solr calculated w.r.t specific
> document and specific field ?
>
>
>
> Thanks in advance !!
>
>
> Thanks,
>
> Pradeep
>


Re: Fuzzy search not working

2020-04-13 Thread Deepu
Hi Walter,

It's type, actual input term was "probl" sorry for the typo.

Thanks,
Pradeep

On Mon, Apr 13, 2020 at 3:46 PM Walter Underwood 
wrote:

> You need to add three letters to “prob” to get “problem”, so it is edit
> distance 3.
> Fuzzy only works to distance 2.
>
> If you want to match prefixes, edge n-grams are a better approach.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Apr 13, 2020, at 2:35 PM, Deepu  wrote:
> >
> > Hi Team,
> >
> > We have 8 text fields (*_txt_en) in schema and one multi valued text
> field
> > which is copy field of other text fields, like below.
> >
> > tittle_txt_en, configuration_summary_txt_en, all_text_txt_ens (multi
> value
> > field)
> >
> > Observed one issue with Fuzzy match, same term with distance of two(~2)
> is
> > working on individual fields but not returning any results from multi
> > valued field.
> >
> > Term we used is "prob" and document has "problem" term in two text
> fields,
> > so all_text field has two occurrences of 'problem" terms.
> >
> >
> >
> > title_txt_en:prob~2. (given results)
> >
> > all_text_txt_ens:prob~2 (no results)
> >
> >
> >
> > is there any other factors involved in distance calculation other
> > than Damerau-Levenshtein Distance algoritham?
> >
> > what might be the reason same input with same distance worked with one
> > field and failed with other field in same collection?
> >
> > is there a way we can get actual distance solr calculated w.r.t specific
> > document and specific field ?
> >
> >
> >
> > Thanks in advance !!
> >
> >
> > Thanks,
> >
> > Pradeep
>
>


Re: Fuzzy search not working

2020-04-13 Thread Walter Underwood
You need to add three letters to “prob” to get “problem”, so it is edit 
distance 3.
Fuzzy only works to distance 2.

If you want to match prefixes, edge n-grams are a better approach.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Apr 13, 2020, at 2:35 PM, Deepu  wrote:
> 
> Hi Team,
> 
> We have 8 text fields (*_txt_en) in schema and one multi valued text field
> which is copy field of other text fields, like below.
> 
> tittle_txt_en, configuration_summary_txt_en, all_text_txt_ens (multi value
> field)
> 
> Observed one issue with Fuzzy match, same term with distance of two(~2) is
> working on individual fields but not returning any results from multi
> valued field.
> 
> Term we used is "prob" and document has "problem" term in two text fields,
> so all_text field has two occurrences of 'problem" terms.
> 
> 
> 
> title_txt_en:prob~2. (given results)
> 
> all_text_txt_ens:prob~2 (no results)
> 
> 
> 
> is there any other factors involved in distance calculation other
> than Damerau-Levenshtein Distance algoritham?
> 
> what might be the reason same input with same distance worked with one
> field and failed with other field in same collection?
> 
> is there a way we can get actual distance solr calculated w.r.t specific
> document and specific field ?
> 
> 
> 
> Thanks in advance !!
> 
> 
> Thanks,
> 
> Pradeep



Re: SolrJ connection leak with SolrCloud and Jetty Gzip compression enabled

2020-04-13 Thread Samuel Garcia Martinez
Reading again the last two paragraphs I realized that, those two specially, are 
very poorly worded (grammar 😓). I tried to rephrase them and correct some of 
the errors below.

Here I can see three different problems:

* HttpSolrCall should not use HttpServletResponse#setCharacterEncoding to set 
the Content-Encoding header. This is obviously a mistake.
* HttpSolrClient, specifically the HttpClientUtil, should be modified to 
prevent that if the Content-Encoding header lies about the actual content, the 
connection is leaked forever. It should the exception though.
* HttpSolrClient should allow clients to customize HttpClient's 
connectionRequestTimeout, preventing the application to be blocked forever 
waiting for a connection to be available. This way, the application could 
respond to requests that won’t use Solr instead of rejecting any incoming 
requests because all threads are blocked forever for a connection that won’t be 
available ever.

I think the two first points are bugs that should be fixed.  The third one is a 
feature improvement to me.

Unless I missed something, I'll file the two bugs and provide a patch for them. 
The same goes for the the feature improvement.



Get Outlook for iOS



En el caso de haber recibido este mensaje por error, le rogamos que nos lo 
comunique por esta misma vía, proceda a su eliminación y se abstenga de 
utilizarlo en modo alguno.
If you receive this message by error, please notify the sender by return e-mail 
and delete it. Its use is forbidden.




From: Samuel Garcia Martinez 
Sent: Monday, April 13, 2020 10:08:36 PM
To: solr-user@lucene.apache.orG 
Subject: SolrJ connection leak with SolrCloud and Jetty Gzip compression enabled

Hi!

Today, I've seen a weird issue in production workloads when the gzip 
compression was enabled. After some minutes, the client app ran out of 
connections and stopped responding.

The cluster setup is pretty simple:
Solr version: 7.7.2
Solr cloud enabled
Cluster topology: 6 nodes, 1 single collection, 10 shards and 3 replicas. 1 
HTTP LB using Round Robin over all nodes
All cluster nodes have gzip enabled for all paths, all HTTP verbs and all MIME 
types.
Solr client: HttpSolrClient targeting the HTTP LB

Problem description: when the Solr node that receives the request has to 
forward the request to a Solr Node that actually can perform the query, the 
response headers are added incorrectly to the client response, causing the 
SolrJ client to fail and to never release the connection back to the pool.

To simplify the case, let's try to start from the following repro scenario:

  *   Start one node with cloud mode and port 8983
  *   Create one single collection (1 shard, 1 replica)
  *   Start another node with port 8984 and the previusly started zk (-z 
localhost:9983)
  *   Start a java application and query the cluster using the node on port 
8984 (the one that doesn't host the collection)

So, the steps occur like:

  *   The application queries node:8984 with compression enabled 
("Accept-Encoding: gzip") and wt=javabin
  *   Node:8984 can't perform the query and creates a http request behind the 
scenes to node:8983
  *   Node:8983 returns a gzipped response with "Content-Encoding: gzip" and 
"Content-Type: application/octet-stream"
  *   Node:8984 adds the "Content-Encoding: gzip" header as character stream to 
the response (it should be forwarded as "Content-Encoding" header, not 
character encoding)
  *   HttpSolrClient receives a "Content-Type: 
application/octet-stream;charset=gzip", causing an exception.
  *   HttpSolrClient tries to quietly close the connection, but since the 
stream is broken, the Utils.consumeFully fails to actually consume the entity 
(it throws another exception in GzipDecompressingEntity#getContent() with "not 
in GZIP format")

The exception thrown by HttpSolrClient is:
java.nio.charset.UnsupportedCharsetException: gzip
   at java.nio.charset.Charset.forName(Charset.java:531)
   at 
org.apache.http.entity.ContentType.create(ContentType.java:271)
   at 
org.apache.http.entity.ContentType.create(ContentType.java:261)
   at org.apache.http.entity.ContentType.parse(ContentType.java:319)
   at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:591)
   at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
   at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
   at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
   at 
org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1015)
   at 
org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1031)
   at 
org.apache.solr.client.solrj.SolrClient$$FastClassBySpringCGLIB$$7fcf36a0.invoke()
   at 
org.springframework.cglib.

Fuzzy search not working

2020-04-13 Thread Deepu
Hi Team,

We have 8 text fields (*_txt_en) in schema and one multi valued text field
which is copy field of other text fields, like below.

tittle_txt_en, configuration_summary_txt_en, all_text_txt_ens (multi value
field)

Observed one issue with Fuzzy match, same term with distance of two(~2) is
working on individual fields but not returning any results from multi
valued field.

Term we used is "prob" and document has "problem" term in two text fields,
so all_text field has two occurrences of 'problem" terms.



title_txt_en:prob~2. (given results)

all_text_txt_ens:prob~2 (no results)



is there any other factors involved in distance calculation other
than Damerau-Levenshtein Distance algoritham?

what might be the reason same input with same distance worked with one
field and failed with other field in same collection?

is there a way we can get actual distance solr calculated w.r.t specific
document and specific field ?



Thanks in advance !!


Thanks,

Pradeep


Fuzzy match issue

2020-04-13 Thread Pradeep Kumar Kolluri (V)
We have 8 text fields (*_txt_en) in schema and one multi valued text field 
which is copy field of other text fields, like below.
tittle_txt_en, configuration_summary_txt_en, all_text_txt_ens (multi value 
field)
Observed one issue with Fuzzy match, same term with distance of two(~2) is 
working on individual fields but not returning any results from multi valued 
field.
Term we used is "prob" and document has "problem" term in two text fields, so 
all_text field has two occurrences of 'problem" terms.
 
title_txt_en:prob~2. (given results)
all_text_txt_ens:prob~2 (no results)
 
is there any other factors involved in distance calculation other than 
Damerau-Levenshtein Distance algoritham?
what might be the reason same input with same distance worked with one field 
and failed with other field in same collection?
is there a way we can get actual distance solr calculated w.r.t specific 
document and specific field ?
 
Thanks in advance !!
 
Thanks,
Pradeep Kumar Kolluri






SolrJ connection leak with SolrCloud and Jetty Gzip compression enabled

2020-04-13 Thread Samuel Garcia Martinez
Hi!

Today, I've seen a weird issue in production workloads when the gzip 
compression was enabled. After some minutes, the client app ran out of 
connections and stopped responding.

The cluster setup is pretty simple:
Solr version: 7.7.2
Solr cloud enabled
Cluster topology: 6 nodes, 1 single collection, 10 shards and 3 replicas. 1 
HTTP LB using Round Robin over all nodes
All cluster nodes have gzip enabled for all paths, all HTTP verbs and all MIME 
types.
Solr client: HttpSolrClient targeting the HTTP LB

Problem description: when the Solr node that receives the request has to 
forward the request to a Solr Node that actually can perform the query, the 
response headers are added incorrectly to the client response, causing the 
SolrJ client to fail and to never release the connection back to the pool.

To simplify the case, let's try to start from the following repro scenario:

  *   Start one node with cloud mode and port 8983
  *   Create one single collection (1 shard, 1 replica)
  *   Start another node with port 8984 and the previusly started zk (-z 
localhost:9983)
  *   Start a java application and query the cluster using the node on port 
8984 (the one that doesn't host the collection)

So, the steps occur like:

  *   The application queries node:8984 with compression enabled 
("Accept-Encoding: gzip") and wt=javabin
  *   Node:8984 can't perform the query and creates a http request behind the 
scenes to node:8983
  *   Node:8983 returns a gzipped response with "Content-Encoding: gzip" and 
"Content-Type: application/octet-stream"
  *   Node:8984 adds the "Content-Encoding: gzip" header as character stream to 
the response (it should be forwarded as "Content-Encoding" header, not 
character encoding)
  *   HttpSolrClient receives a "Content-Type: 
application/octet-stream;charset=gzip", causing an exception.
  *   HttpSolrClient tries to quietly close the connection, but since the 
stream is broken, the Utils.consumeFully fails to actually consume the entity 
(it throws another exception in GzipDecompressingEntity#getContent() with "not 
in GZIP format")

The exception thrown by HttpSolrClient is:
java.nio.charset.UnsupportedCharsetException: gzip
   at java.nio.charset.Charset.forName(Charset.java:531)
   at 
org.apache.http.entity.ContentType.create(ContentType.java:271)
   at 
org.apache.http.entity.ContentType.create(ContentType.java:261)
   at org.apache.http.entity.ContentType.parse(ContentType.java:319)
   at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:591)
   at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
   at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
   at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
   at 
org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1015)
   at 
org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1031)
   at 
org.apache.solr.client.solrj.SolrClient$$FastClassBySpringCGLIB$$7fcf36a0.invoke()
   at 
org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)

Here I can see three different problems:

  *   HttpSolrCall should not use HttpServletResponse#setCharacterEncoding to 
set the Content-Encoding header. This is obviously a typo.
  *   HttpSolrClient, specially the HttpClientUtil should be modified to 
prevent that if the Content-Encoding header lies about the actual content, 
there should be an exception, but shouldn't leak the connection forever.
  *   HttpSolrClient should allow clients to customize HttpClient's 
connectionRequestTimeout, preventing the application to respond to any other 
incoming request because all requests that used could be forever blocked 
waiting for a free connection that will never be free.

I think the two points are to bugs and the third one is a feature improvement. 
Unless I missed something, I'll file the two bugs and provide a patch for them. 
The same goes for the the feature improvement.



En el caso de haber recibido este mensaje por error, le rogamos que nos lo 
comunique por esta misma v?a, proceda a su eliminaci?n y se abstenga de 
utilizarlo en modo alguno.
If you receive this message by error, please notify the sender by return e-mail 
and delete it. Its use is forbidden.




Re: Proper way to manage managed-schema file

2020-04-13 Thread Alexandre Rafalovitch
If you are using API (which AdminUI does), the regenerated file will
loose comments and sort everything in particular order. That's just
the implementation at the moment.

If you don't like that, you can always modify the schema file by hand
and reload the core to notice the changes. You can even set the schema
to be immutable to avoid accidentally doing it.

The other option is not to have the comments in that file and then,
after first rewrite, the others are quite incremental and make it easy
to track the changes.

Regards,
   Alex.

On Mon, 6 Apr 2020 at 14:11, TK Solr  wrote:
>
> I am using Solr 8.3.1 in non-SolrCloud mode (what should I call this mode?) 
> and
> modifying managed-schema.
>
> I noticed that Solr does override this file wiping out all my comments and
> rearranging the order. I noticed there is a "DO NOT EDIT" comment. Then, what 
> is
> the proper/expected way to manage this file? Admin UI can add fields but 
> cannot
> edit existing one or add new field types. Do I keep a script of many schema
> calls? (Then how do I reset the default to the initial one, which would be
> needed before re-re-playing the schema calls.)
>
> TK
>
>


Re: how to use multiple update process chain?

2020-04-13 Thread Alexandre Rafalovitch
You can only have one chain at the time.

You can, however, create your custom URP chain to contain
configuration from all three.

Or, if you do use multiple chains that are configured similarly, you
can pull each URP into its own definition and then mix and match then
either in the chain or even per request (or in request defaults):
https://lucene.apache.org/solr/guide/8_5/update-request-processors.html#configuring-individual-processors-as-top-level-plugins

Regards,
   Alex.

On Sat, 11 Apr 2020 at 15:16, derrick cui
 wrote:
>
> Hi,
> I need to do three tasks.1. add-unkown-fields-to-the-schema2. create 
> composite key3. remove duplicate for specified field
> I defined update.chain as below, but only the first one works, the others 
> don't work. please help. thanks
> 
>   
> add-unknown-fields-to-the-schema
> composite-id
> deduplicateTaxonomy
>   
> 
>  default="${update.autoCreateFields:true}"
>  
> processor="uuid,remove-blank,field-name-mutating,parse-boolean,parse-long,parse-double,parse-date,add-schema-fields">
>
>   
>   
> 
> 
>   
> _gl_collection
> _gl_id
>   
>   
> id
> _gl_id
>   
>   
> _gl_id
> -
>   
>   
>   
> 
> 
>   
> _gl_dp_.*
> _gl_ss_score_.*
>   
>   
>   
> 
> thanks
>


Re: Required operator (+) is being ignored when using default conjunction operator AND

2020-04-13 Thread Chris Hostetter
On Sat, 11 Apr 2020, Eran Buchnick wrote:

: Date: Sat, 11 Apr 2020 23:34:37 +0300
: From: Eran Buchnick 
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: Re: Required operator (+) is being ignored when using default
: conjunction operator AND
: 
: Hoss, thanks a lot for the informative response. I understood my
: misunderstanding with infix and prefix operators. Need to rethink about the
: term occurrence support in my search service.

Sure,

And i appreciate you asking up about this -- it spurred me to file this...

https://issues.apache.org/jira/browse/LUCENE-9315


: On Mon, Apr 6, 2020, 20:43 Chris Hostetter  wrote:
: 
: >
: > : I red your attached blog post (and more) but still the penny hasn't
: > dropped
: > : yet about what causes the operator clash when the default operator is
: > AND.
: > : I red that when q.op=AND, OR will change the left(if not MUST_NOT) and
: > : right clause Occurs to SHOULD - what that means is that the "order of
: > : operations" in this case is giving the infix operator the mandate to
: > : control the prefix operator?
: >
: > Not quite anything that complex... sorry, but the blog post was focused
: > on
: > describe *what* happens when parsing, do explain why mixng prefix/infix is
: > bad ... i avoided getting bogged down into *why* it happens exactly the
: > way it does.
: >
: >
: > To get to the "why" you have to circle back to the higher level concept
: > that the "prefix" operators very closely align to the underlying concepts
: > of the BooleanQuery/BooleanClause data structures: that each clause has an
: > "Occur" property which is either: MUST/SHOULD/MUST_NOT (or FILTER, but
: > setting asside scoring that's functionally equivilent to MUST).
: >
: > The 'infix' operators just manipulate the Occur property of the clauses on
: > either side of them.
: >
: > 'q.op=AND' and 'q.op=OR' are functionally really about setting the
: > "Default Occur Value For All Clauses That Do Not Have An Explicit Occur
: > Value" (ie: q.op=Occur.MUST and q.op=Occur.SHOULD) ... where the explicit
: > Occur value for each clause would be specified by it's prefix (+=MUST,
: > -=MUST_NOT ... there is no supported prefix for SHOULD, which is why
: > q.op=SHOULD is the defualt nad chaning it complicates the parser logic)
: >
: > In essence: After the q.op/default.occur is applied to all clauses (that
: > don't already have a prefix), then there is a left to right parsing that
: > let's the infix operators modify the "Occur" values of the clauses on
: > either side of them -- if those Occur values match the "default" for this
: > parser.
: >
: > So let's imagine 2 requests...
: >
: > 1)  {!q.op=AND}a +b OR c +d AND e
: > 2)  {!q.op=OR} x +y OR z +r AND s
: >
: > Here's what those wind up looking like internally with the default
: > applied...
: >
: > 1) q.op=MUST:MUST(a)   MUST(b) OR MUST(c)   MUST(d) AND MUST(e)
: > 2) q.op=SHOULD:  SHOULD(x) MUST(y) OR SHOULD(z) MUST(r) AND SHOULD(s)
: >
: > And here's how the infix operators change things as it parses left to
: > right building up the clauses...
: >
: > 1) q.op=MUST:MUST(a)   SHOULD(b) SHOULD(c) MUST(d)  MUST(e)
: > 2) q.op=SHOULD:  SHOULD(x) MUST(y)   SHOULD(z) MUST(r)  MUST(s)
: >
: > It's not actually done in "two passes" -- it's just that as the parsing
: > is done left to right, the default Occur is used unless/until set by a
: > prefix operators, and infix operators not only set the occur value
: > for the "next" clause, but also reach back to override the prior
: > Occur value if it matches the Default: because there is no "history" kept
: > to indicate that it was explicitly set, or how.  the left to right parsing
: > just does the best it can with the context it's got.
: >
: > :  A little background - I am trying to implement a google search like
: > : service and want to have the ability to have required and prohibit
: > : operators while still allowing default intersection operation as default
: > : operator. How can I achieve this with this limitation?
: >
: > If you want "intersection" to be the defualt, i'm not sure why you care
: > about having a "required" operator? (you didn't mention anything about an
: > "optional" operator even though your original example explicitly used
: > "OR" ... so not really sure if that was just a contrived example or if you
: > actaully care about supporting it?
: >
: > If you're not hung up on using a specific syntax, you might want to
: > consider the "simple" QParser -- it unfortunately re-uses the 'q.op=AND'
: > param syntax to indicate what the default Occur should be for clauses, but
: > the overall syntax is much simple: there is a prefix negation operator,
: > but other wise the infix "+" and "|" operators support boolean AND and OR
: > -- there is no prefix operators for MUST/SHOULD.  You can also turn off
: > individual operators you don't like...
: >
: >
: > 
https://lucene.apache.org/solr/guide/8_5/other-parsers.html#OtherParsers-SimpleQ

Re: Nested Document with replicas slow

2020-04-13 Thread Michael Gibney
Depending on how you're measuring performance (and whether your use case
benefits from caching), it might be worth looking into stable replica
routing (configured with the "replica.base" sub-parameter of the
shards.preference

parameter).
With a single replica per shard, every request is routed to a single
replica for a given shard, ensuring effective use of replica-level caches.
With multiple replicas per shard, by default each request is routed
randomly to specific replicas. The more shards you have (and the more
replicas), the longer it takes to "warm" caches to the point where the user
actually perceives decreased latency. For replication factor > 1, stable
cache entries can be initialized by warming queries, but transient cache
entries (particularly the queryResultCache) can in some cases be rendered
effectively useless in combination with the default random replica routing.
More discussion can be found at SOLR-13257
.

To be sure, this may not affect your case, but if you're seeing performance
degradation associated with adding replicas, it's probably worth
considering.


On Mon, Apr 13, 2020 at 9:48 AM Jae Joo  wrote:

> I have multiple 100 M documents using Nested Document for joining. It is
> the fastest way for joining in a single replica. By adding more replicas (2
> or 3), the performance is slow down significantly. (about 100x times).
> Does anyone have same experience?
>
> Jae
>


Nested Document with replicas slow

2020-04-13 Thread Jae Joo
I have multiple 100 M documents using Nested Document for joining. It is
the fastest way for joining in a single replica. By adding more replicas (2
or 3), the performance is slow down significantly. (about 100x times).
Does anyone have same experience?

Jae


Re: how to get rid of double quotes in solr

2020-04-13 Thread sefty nindyastuti
the picture is the output from logstash, so I use config logstash to
accept input from the file and then output to the solr


Pada tanggal Sen, 13 Apr 2020 19.07, Erick Erickson 
menulis:

> I don’t quite know what you’re asking about. Is that input or intput to
> Solr? Or is it output from logstash?
>
> What are you indexing? Because that doesn't look like data from a solr log.
>
> You might want to review: https://wiki.apache.org/solr/UsingMailingLists
>
> Best,
> Erick
>
> > On Apr 13, 2020, at 12:24 AM, sefty nindyastuti 
> wrote:
> >
> > I have a problem when indexing log data clusters in solr using logstash
> and filebeat. there are double quotes in the solr index results,
> > how to solve this problem, please help
> >
> > expect the results of the index that appears in solr as below:
> >
> >  {
> > "input": "log"
> > "hostname": "localhost"
> > "id": "22eddbc9-e60f-29cd-a352-b40154ba1736",
> > "type": "filebeat"
> > "ephemeral_id": "1a31d6e0-8ed9-1307-215f-5dfd361364c9"
> > "version": "7.6.1"
> > "offset": "2061794 "
> > "path": " /var/log/hadoop/hdfs/hadoop-hdfs-secondarynamenode-xx.log "
> > "host": "localhostxxx",
> > "message": "2020-04-11 19: 04: 28,575 INFO common.Util
> (Util.java:receiveFile(314)) - Combined time for file downloads and fsync
> to all disks stores 0.02s. The file download stores 0.02s at 58750.00 KB /
> s Synchronous (fsync) write to disk of / hadoop / hdfs / namesecondary /
> current / edits_tmp_ "
> > {
> >
>
>


Re: how to get rid of double quotes in solr

2020-04-13 Thread Erick Erickson
I don’t quite know what you’re asking about. Is that input or intput to Solr? 
Or is it output from logstash?

What are you indexing? Because that doesn't look like data from a solr log.

You might want to review: https://wiki.apache.org/solr/UsingMailingLists

Best,
Erick

> On Apr 13, 2020, at 12:24 AM, sefty nindyastuti  wrote:
> 
> I have a problem when indexing log data clusters in solr using logstash and 
> filebeat. there are double quotes in the solr index results,
> how to solve this problem, please help
> 
> expect the results of the index that appears in solr as below:
> 
>  {
> "input": "log"
> "hostname": "localhost"
> "id": "22eddbc9-e60f-29cd-a352-b40154ba1736",
> "type": "filebeat" 
> "ephemeral_id": "1a31d6e0-8ed9-1307-215f-5dfd361364c9"
> "version": "7.6.1"
> "offset": "2061794 "
> "path": " /var/log/hadoop/hdfs/hadoop-hdfs-secondarynamenode-xx.log "
> "host": "localhostxxx",
> "message": "2020-04-11 19: 04: 28,575 INFO common.Util 
> (Util.java:receiveFile(314)) - Combined time for file downloads and fsync to 
> all disks stores 0.02s. The file download stores 0.02s at 58750.00 KB / s 
> Synchronous (fsync) write to disk of / hadoop / hdfs / namesecondary / 
> current / edits_tmp_ "
> {
> 



Queries on adding headers to solrj Request

2020-04-13 Thread dinesh naik
Hi all,
We are planning to add security to Solr using . For this we are adding few
information in the headers of each SolrJ Request. These request will be
intercepted by some application (proxy) in the Solr VM and then route it to
Solr ( Considering Solr port as 8983 ) .
Could you please answer below queries:
 1. Are there any API ( Path ) that Solr Client cannot access and only Solr
uses for Intra node communication?
 2. As the SolrJ client will add headers, Intra communication from Solr
also needs to add these headers ( like ping request from Solr1 Node to
Solr2 Node ). Could Solr add custom headers for intra node communication?
 3. Apart from 8983 node, are there any other ports Solr is using for intra
node communication?
 4. how to add headers to CloudSolrClient ?

-- 
Best Regards,
Dinesh Naik