How to suggest prefix matches over all tokens of a field (was Re: Suggest Component, prefix match (sur-)name)

2019-02-18 Thread David '-1' Schmid
On 2019-02-18T18:12:44, David '-1' Schmid wrote:
> Will report back if that's working out.
It's working!

If anybody want's to replicate, here's what I ended up with.

.. managed-schema:
. 
. 
. 
. 
.
. 
. 
.   
. 
. 
.   
. 
. 
. 
.
. 
. 
.   
. 
. 
. 
.   
.   
. 
. 
.   
. 
. 
. 
.

The requestHandler uses the three fields above to provide suggestions.

.. solrconfig.xml:
. 
. 
.   
. edismax
. 10
. author
. author_lower^10 author_ngram
.   
. 
.

In case a token will match the name (or surname) of an author completely
it will boost the complete match over the partial match from
author_ngram:

Let's say I want to find "Hauck" and get a result for the first four chars.

.. curl http://localhost:8983/solr/dblp/suggest_author?q=hauc
. "docs": [
. {
. "author": [
. "Gregor Hauc"
. ]
. },
. {
. "author": [
. "Andrej Kovacic",
. "Gregor Hauc",
. "Brina Buh",
. "Mojca Indihar Stemberger"
. ]
. },
. {
. "author": [
. "Franz J. Hauck",
. "Franz Johannes Hauck"
. ]
. },
.  /* ... */
. ]

once I get the last character in, it will boost complete over partial
matches:

.. curl http://localhost:8983/solr/dblp/suggest_author?q=hauck
. "docs": [
. {
. "author": [
. "Rainer Hauck"
. ]
. },
. {
. "author": [
. "Julia Hauck"
. ]
. },
. {
. "author": [
. "Bernd Hauck"
. ]
. },
.  /* ... */
. ]

As these are not the persons I were looking for, I start typing the
first name:

.. curl 'http://localhost:8983/solr/dblp/suggest_author?q=hauck%20fra'
. "docs": [
. {
. "author": [
. "Fra Angelico Viray"
. ]
. },
. {
. "author": [
. "Alberto Del Fra"
. ]
. },
. {
. "author": [
. "Alberto Del Fra"
. ]
. },
.  /* ... */
. ]

ohno, now my previous match was replaced by some other match.
This can be curcumvented by adding "q.op=AND" to enforce both:

.. curl 'http://localhost:8983/solr/dblp/suggest_author?q.op=AND=hauck%20fra'
. "docs": [
. {
. "author": [
. "Franz J. Hauck",
. "Franz Johannes Hauck"
. ]
. },
.  /* ... */
. ]

Which achieves what I wanted, really.
q.op can be set in solrconfig to always use AND.
Adding hl=true to the query will provide highlighting:

.. curl 
'http://localhost:8983/solr/dblp/suggest_author?q.op=AND=hauck%20fra=true'
. "highlighting": {
. "homepages/h/FranzJHauck": {
. "author_lower": [
. "Franz J. Hauck"
. ],
. "author_ngram": [
. "Franz J. Hauck"
. ]
. },

I'm pretty happy with this :D

The original idea came from the book "Solr in Action" by: Trey Grainger
and Timothy Potter. It's from 2014 (builds on solr 4.7), and might need some
adaptions :D

regards,
-1


Re: Faceting filter tagging doesn't work in case where 0 matches are found

2019-02-18 Thread Mikhail Khludnev
I've consulted regarding this case. This is not an issue, you may bring
facet back adding not yet documented property processEmpty:true

On Mon, Feb 18, 2019 at 10:42 AM Mikhail Khludnev  wrote:

> Hello,
> I'm not sure but it sounds like an issue, would you mind to raise one at
> https://issues.apache.org/jira/projects/SOLR/ ?
>
> On Sun, Feb 17, 2019 at 6:57 PM Arvydas Silanskas <
> nma.arvydas.silans...@gmail.com> wrote:
>
>> Good evening,
>>
>> I am using facet json api to query aggregation data, and I don't care
>> about
>> the returned documents themselves. One of the use cases I want to employ
>> is
>> tagging filter queries for fields, and then exclude those filters when
>> faceting. My problem is, however, that in those cases where the filter has
>> 0 matches, the facets aren't calculated at all.
>>
>> I'm using dataset I found at
>> https://www.raspberry.nl/2010/12/29/solr-test-dataset/ . To illustrate --
>> this is an an example when filter doesn't filter out everything (working
>> as
>> expected):
>>
>> Request:
>> {
>>   "query": "*:*",
>>   "facet": {
>> "latitude_f": {
>>   "type": "range",
>>   "start": -90,
>>   "facet": {
>> "population": "sum(population_i)"
>>   },
>>   "domain": {
>> "excludeTags": "latitude_f"
>>   },
>>   "gap": 10,
>>   "end": -70,
>>   "field": "latitude_f"
>> }
>>   },
>>   "limit": 0,
>>   "filter": [
>> "{!tag=latitude_f}latitude_f:[-80.0 TO -70.0]"
>>   ]
>> }
>>
>> Response:
>>
>> {
>>   "facets": {
>> "count": 1,
>> "latitude_f": {
>>   "buckets": [
>> {
>>   "val": -90,
>>   "count": 0
>> },
>> {
>>   "val": -80,
>>   "count": 1,
>>   "population": 1258
>> }
>>   ]
>> }
>>   }
>> }
>>
>>
>> Example when filter filters everything out:
>>
>> Request is the same, except the filter field value is
>>
>>   "filter": [
>> "{!tag=latitude_f}latitude_f:[-90.0 TO -80.0]"
>>   ]
>>
>> and response is
>>
>>  "facets":{
>> "count":0}
>>
>> . I'm returned no facets whatsoever. However I'd expect the response to be
>> the same as and for the first request, since the only one filter is used,
>> and is excluded in faceting.
>>
>> Is this a bug? What are the workarounds for such problem?
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


-- 
Sincerely yours
Mikhail Khludnev


***UNCHECKED*** Re: Re: solr 7.0: What causes the segment to flush

2019-02-18 Thread DIMA

Buongiorno,




Vedi allegato e di confermare.

Password: 1234567




Grazie




DIMA





From: khi...@gmail.com
Sent: Tue, 17 Oct 2017 15:40:50 +
To: solr-user@lucene.apache.org
Subject: Re: solr 7.0: What causes the segment to flush
 

I take my yesterdays comment back. I assumed that the file being written
is a segment, however after letting solr run for the night. I see that the
segment is flushed at the expected size:1945MB (so that file which i
observed was still open for writing).
Now, I have two other questions:-

1. Is there a way to not write to disk continuously and only write the file
when segment is flushed?

2. With 6.5: i had ramBufferSizeMB=20G and limiting the threadCount to 12
(since LUCENE-6659
,
there is no configuration for indexing thread count, so I did a local
workaround to limit the number of threads in code); I had very good write
throughput. But with 7.0, I am getting comparable throughput only at
indexing threadcount > 50. What could be wrong ?


Thanks @Erick, I checked the commit settings, both soft and hard commits
are off.




On Tue, Oct 17, 2017 at 3:47 AM, Amrit Sarkar 
wrote:

> >
> > In 7.0, i am finding that the file is written to disk very early on
> > and it is being updated every second or so. Had something changed in 7.0
> > which is causing it?  I tried something similar with solr 6.5 and i was
> > able to get almost a GB size files on disk.
>
>
> Interesting observation, Nawab, with ramBufferSizeMB=20G, you are getting
> 20GB segments on 6.5 or less? a GB?
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>
> On Tue, Oct 17, 2017 at 12:48 PM, Nawab Zada Asad Iqbal 
> wrote:
>
> > Hi,
> >
> > I have  tuned  (or tried to tune) my settings to only flush the segment
> > when it has reached its maximum size. At the moment,I am using my
> > application with only a couple of threads (i have limited to one thread
> for
> > analyzing this scenario) and my ramBufferSizeMB=2 (i.e. ~20GB). With
> > this, I assumed that my file sizes on the disk will be at in the order of
> > GB; and no segments will be flushed until the segments in memory size 
>is
> > 2GB. In 7.0, i am finding that the file is written to disk very early on
> > and it is being updated every second or so. Had something changed in 7.0
> > which is causing it?  I tried something similar with solr 6.5 and i was
> > able to get almost a GB size files on disk.
> >
> > How can I control it to not write to disk until the segment has reached
> its
> > maximum permitted size (1945 MB?) ? My write traffic is new 
>only (i.e.,
> > it doesnt delete any document) , however I also found following
> infostream
> > logs, which incorrectly say delete=true:
> >
> > Oct 16, 2017 10:18:29 PM INFO  (qtp761960786-887) [   x:filesearch]
> > o.a.s.c.S.Request [filesearch]  webapp=/solr path=/update
> > params={commit=false} status=0 QTime=21
> > Oct 16, 2017 10:18:29 PM INFO  (qtp761960786-889) [   x:filesearch]
> > o.a.s.u.LoggingInfoStream [DW][qtp761960786-889]: anyChanges?
> > numDocsInRam=4434 deletes=true hasTickets:false
> pendingChangesInFullFlush:
> > false
> > Oct 16, 2017 10:18:29 PM INFO  (qtp761960786-889) [   x:filesearch]
> > o.a.s.u.LoggingInfoStream [IW][qtp761960786-889]: nrtIsCurrent:
> infoVersion
> > matches: false; DW changes: true; BD changes: false
> > Oct 16, 2017 10:18:29 PM INFO  (qtp761960786-889) [   x:filesearch]
> > o.a.s.c.S.Request [filesearch]  webapp=/solr path=/admin/luke
> > params={show=index=0=json} status=0 QTime=0
> >
> >
> >
> > Thanks
> > Nawab
> >
>


<>


Re: Faceting filter tagging doesn't work in case where 0 matches are found

2019-02-18 Thread Zheng Lin Edwin Yeo
Hi,

Which version of Solr are you using when you face this problem?

Regards,
Edwin

On Mon, 18 Feb 2019 at 15:43, Mikhail Khludnev  wrote:

> Hello,
> I'm not sure but it sounds like an issue, would you mind to raise one at
> https://issues.apache.org/jira/projects/SOLR/ ?
>
> On Sun, Feb 17, 2019 at 6:57 PM Arvydas Silanskas <
> nma.arvydas.silans...@gmail.com> wrote:
>
> > Good evening,
> >
> > I am using facet json api to query aggregation data, and I don't care
> about
> > the returned documents themselves. One of the use cases I want to employ
> is
> > tagging filter queries for fields, and then exclude those filters when
> > faceting. My problem is, however, that in those cases where the filter
> has
> > 0 matches, the facets aren't calculated at all.
> >
> > I'm using dataset I found at
> > https://www.raspberry.nl/2010/12/29/solr-test-dataset/ . To illustrate
> --
> > this is an an example when filter doesn't filter out everything (working
> as
> > expected):
> >
> > Request:
> > {
> >   "query": "*:*",
> >   "facet": {
> > "latitude_f": {
> >   "type": "range",
> >   "start": -90,
> >   "facet": {
> > "population": "sum(population_i)"
> >   },
> >   "domain": {
> > "excludeTags": "latitude_f"
> >   },
> >   "gap": 10,
> >   "end": -70,
> >   "field": "latitude_f"
> > }
> >   },
> >   "limit": 0,
> >   "filter": [
> > "{!tag=latitude_f}latitude_f:[-80.0 TO -70.0]"
> >   ]
> > }
> >
> > Response:
> >
> > {
> >   "facets": {
> > "count": 1,
> > "latitude_f": {
> >   "buckets": [
> > {
> >   "val": -90,
> >   "count": 0
> > },
> > {
> >   "val": -80,
> >   "count": 1,
> >   "population": 1258
> > }
> >   ]
> > }
> >   }
> > }
> >
> >
> > Example when filter filters everything out:
> >
> > Request is the same, except the filter field value is
> >
> >   "filter": [
> > "{!tag=latitude_f}latitude_f:[-90.0 TO -80.0]"
> >   ]
> >
> > and response is
> >
> >  "facets":{
> > "count":0}
> >
> > . I'm returned no facets whatsoever. However I'd expect the response to
> be
> > the same as and for the first request, since the only one filter is used,
> > and is excluded in faceting.
> >
> > Is this a bug? What are the workarounds for such problem?
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


Re: UpdateHandler batch size / search solr-user

2019-02-18 Thread Erick Erickson
Typically, people set their autocommit (hard) settings in solrconfig.xml and 
forget about it. I usually use a time-based trigger and don’t use documents as 
a trigger.

If you were waiting until the end of your batch run (all 46M docs) to issue a 
commit, that’s an anit-pattern. Until you do a hard commit, all the incoming 
documents are held in the transaction log, see: 
https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
 
.
 Setting the autocommit settings to, say, 15 seconds should give a flatter 
response time.

The Solr mailing list archives, see: 
http://lucene.apache.org/solr/community.html#mailing-lists-irc 


Best,
Erick

> On Feb 18, 2019, at 10:03 AM, David '-1' Schmid  wrote:
> 
> Hello!
> 
> Another question I could not find an answer to:
> is there a best-practice / recommendation for pushing several million
> documents into a new index?
> 
> I'm currently splittig my documents into batches of 10,000 json-line
> payloads into the update request handler, with commit set to 'true'
> (yes, for each of the batches).
> I'm using commit since that got me stable 'QTime' around ~2100, without
> commiting every batch, the QTime will degrade ten-fold by the time I
> sent somewhere around 1,000,000 documents.
> This will steadily climb, so after I sent all 46M documents I end up
> with QTime values about 40,000 in case I don't commit every batch
> immediately.
> 
> Since I cannot find anything in my mails, I wanted to search the
> solr-user archives but, as far as I can tell: there is no such thing.
> Maybe I can't see it or just glossed over it, but is there no searchable
> index of solr-user? Any hints?
> 
> regards,
> -1



UpdateHandler batch size / search solr-user

2019-02-18 Thread David '-1' Schmid
Hello!

Another question I could not find an answer to:
is there a best-practice / recommendation for pushing several million
documents into a new index?

I'm currently splittig my documents into batches of 10,000 json-line
payloads into the update request handler, with commit set to 'true'
(yes, for each of the batches).
I'm using commit since that got me stable 'QTime' around ~2100, without
commiting every batch, the QTime will degrade ten-fold by the time I
sent somewhere around 1,000,000 documents.
This will steadily climb, so after I sent all 46M documents I end up
with QTime values about 40,000 in case I don't commit every batch
immediately.

Since I cannot find anything in my mails, I wanted to search the
solr-user archives but, as far as I can tell: there is no such thing.
Maybe I can't see it or just glossed over it, but is there no searchable
index of solr-user? Any hints?

regards,
-1


Re: Suggest Component, prefix match (sur-)name

2019-02-18 Thread David '-1' Schmid
Hello again!

After playing around with more simple solutions, I gather the suggester
cannot do this.
I've found a resource (namely 'Solr in Action'), they use a
combination of

- two copyFields:
  - solr.TextField with simple whitespace tokenizer (field:'words')
  - as above, but with added EdgeNGramFilter (field:'ngrams')
- a solr.SearchHandler with
  - defType="edismax" query parser
  - fl="author"
  - qf="words^10 ngrams" to boost complete matches over ngram matches

It looks promising, reindexing will take some time.
I'll look into it in ~16h or so, as this will run over night.

Will report back if that's working out.

regards,
-1


Re: Solr 7.7 UpdateRequestProcessor broken

2019-02-18 Thread Jason Gerlowski
Hey all,

I have a proposed update which adds a 7.7 section to our "Upgrade
Notes" ref-guide page.  I put a mention of this in there, but don't
have a ton of context on the issue.  Would appreciate a review from
anyone more familiar.  Check out SOLR-13256 if you get a few minutes.

Best,

Jason

On Mon, Feb 18, 2019 at 9:06 AM Jan Høydahl  wrote:
>
> Thanks for chiming in Markus. Yea, same with the langid tests, they just work 
> locally with manually constructed SolrInputDocument objects.
> This bug breaking change sounds really scary and we should add an UPGRADE 
> NOTE somewhere.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> > 15. feb. 2019 kl. 10:34 skrev Markus Jelsma :
> >
> > I stumbled upon this too yesterday and created SOLR-13249. In local unit 
> > tests we get String but in distributed unit tests we get a 
> > ByteArrayUtf8CharSequence instead.
> >
> > https://issues.apache.org/jira/browse/SOLR-13249
> >
> >
> >
> > -Original message-
> >> From:Andreas Hubold 
> >> Sent: Friday 15th February 2019 10:10
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Solr 7.7 UpdateRequestProcessor broken
> >>
> >> Hi,
> >>
> >> thank you, Jan.
> >>
> >> I've created https://issues.apache.org/jira/browse/SOLR-13255. Maybe you
> >> want to add your patch to that ticket. I did not have time to test it yet.
> >>
> >> So I guess, all SolrJ usages have to handle CharSequence now for string
> >> fields? Well, this really sounds like a major breaking change for custom
> >> code.
> >>
> >> Thanks,
> >> Andreas
> >>
> >> Jan Høydahl schrieb am 15.02.19 um 09:14:
> >>> Hi
> >>>
> >>> This is a subtle change which is not detected by our langid unit tests, 
> >>> as I think it only happens when document is trasferred with SolrJ and 
> >>> Javabin codec.
> >>> Was introduced in https://issues.apache.org/jira/browse/SOLR-12992
> >>>
> >>> Please create a new JIRA issue for langid so we can try to fix it in 7.7.1
> >>>
> >>> Other SolrInputDocument users assuming String type for strings in 
> >>> SolrInputDocument would also be vulnerable.
> >>>
> >>> I have a patch ready that you could test:
> >>>
> >>> Index: 
> >>> solr/contrib/langid/src/java/org/apache/solr/update/processor/LangDetectLanguageIdentifierUpdateProcessor.java
> >>> IDEA additional info:
> >>> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
> >>> <+>UTF-8
> >>> ===
> >>> --- 
> >>> solr/contrib/langid/src/java/org/apache/solr/update/processor/LangDetectLanguageIdentifierUpdateProcessor.java
> >>>   (revision 8c831daf4eb41153c25ddb152501ab5bae3ea3d5)
> >>> +++ 
> >>> solr/contrib/langid/src/java/org/apache/solr/update/processor/LangDetectLanguageIdentifierUpdateProcessor.java
> >>>   (date 1550217809000)
> >>> @@ -60,12 +60,12 @@
> >>>Collection fieldValues = doc.getFieldValues(fieldName);
> >>>if (fieldValues != null) {
> >>>  for (Object content : fieldValues) {
> >>> -  if (content instanceof String) {
> >>> -String stringContent = (String) content;
> >>> +  if (content instanceof CharSequence) {
> >>> +CharSequence stringContent = (CharSequence) content;
> >>>  if (stringContent.length() > maxFieldValueChars) {
> >>> -  detector.append(stringContent.substring(0, 
> >>> maxFieldValueChars));
> >>> +  detector.append(stringContent.subSequence(0, 
> >>> maxFieldValueChars).toString());
> >>>  } else {
> >>> -  detector.append(stringContent);
> >>> +  detector.append(stringContent.toString());
> >>>  }
> >>>  detector.append(" ");
> >>>} else {
> >>> Index: 
> >>> solr/contrib/langid/src/java/org/apache/solr/update/processor/LanguageIdentifierUpdateProcessor.java
> >>> IDEA additional info:
> >>> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
> >>> <+>UTF-8
> >>> ===
> >>> --- 
> >>> solr/contrib/langid/src/java/org/apache/solr/update/processor/LanguageIdentifierUpdateProcessor.java
> >>> (revision 8c831daf4eb41153c25ddb152501ab5bae3ea3d5)
> >>> +++ 
> >>> solr/contrib/langid/src/java/org/apache/solr/update/processor/LanguageIdentifierUpdateProcessor.java
> >>> (date 1550217691000)
> >>> @@ -413,10 +413,10 @@
> >>>  Collection fieldValues = doc.getFieldValues(fieldName);
> >>>  if (fieldValues != null) {
> >>>for (Object content : fieldValues) {
> >>> -if (content instanceof String) {
> >>> -  String stringContent = (String) content;
> >>> +if (content instanceof CharSequence) {
> >>> +  CharSequence stringContent = (CharSequence) content;
> >>>if (stringContent.length() > maxFieldValueChars) {
> >>> -

Re: Solr 7.7 UpdateRequestProcessor broken

2019-02-18 Thread Jan Høydahl
Thanks for chiming in Markus. Yea, same with the langid tests, they just work 
locally with manually constructed SolrInputDocument objects.
This bug breaking change sounds really scary and we should add an UPGRADE NOTE 
somewhere.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 15. feb. 2019 kl. 10:34 skrev Markus Jelsma :
> 
> I stumbled upon this too yesterday and created SOLR-13249. In local unit 
> tests we get String but in distributed unit tests we get a 
> ByteArrayUtf8CharSequence instead.
> 
> https://issues.apache.org/jira/browse/SOLR-13249 
> 
> 
> 
> -Original message-
>> From:Andreas Hubold 
>> Sent: Friday 15th February 2019 10:10
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr 7.7 UpdateRequestProcessor broken
>> 
>> Hi,
>> 
>> thank you, Jan.
>> 
>> I've created https://issues.apache.org/jira/browse/SOLR-13255. Maybe you 
>> want to add your patch to that ticket. I did not have time to test it yet.
>> 
>> So I guess, all SolrJ usages have to handle CharSequence now for string 
>> fields? Well, this really sounds like a major breaking change for custom 
>> code.
>> 
>> Thanks,
>> Andreas
>> 
>> Jan Høydahl schrieb am 15.02.19 um 09:14:
>>> Hi
>>> 
>>> This is a subtle change which is not detected by our langid unit tests, as 
>>> I think it only happens when document is trasferred with SolrJ and Javabin 
>>> codec.
>>> Was introduced in https://issues.apache.org/jira/browse/SOLR-12992
>>> 
>>> Please create a new JIRA issue for langid so we can try to fix it in 7.7.1
>>> 
>>> Other SolrInputDocument users assuming String type for strings in 
>>> SolrInputDocument would also be vulnerable.
>>> 
>>> I have a patch ready that you could test:
>>> 
>>> Index: 
>>> solr/contrib/langid/src/java/org/apache/solr/update/processor/LangDetectLanguageIdentifierUpdateProcessor.java
>>> IDEA additional info:
>>> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
>>> <+>UTF-8
>>> ===
>>> --- 
>>> solr/contrib/langid/src/java/org/apache/solr/update/processor/LangDetectLanguageIdentifierUpdateProcessor.java
>>>   (revision 8c831daf4eb41153c25ddb152501ab5bae3ea3d5)
>>> +++ 
>>> solr/contrib/langid/src/java/org/apache/solr/update/processor/LangDetectLanguageIdentifierUpdateProcessor.java
>>>   (date 1550217809000)
>>> @@ -60,12 +60,12 @@
>>>Collection fieldValues = doc.getFieldValues(fieldName);
>>>if (fieldValues != null) {
>>>  for (Object content : fieldValues) {
>>> -  if (content instanceof String) {
>>> -String stringContent = (String) content;
>>> +  if (content instanceof CharSequence) {
>>> +CharSequence stringContent = (CharSequence) content;
>>>  if (stringContent.length() > maxFieldValueChars) {
>>> -  detector.append(stringContent.substring(0, 
>>> maxFieldValueChars));
>>> +  detector.append(stringContent.subSequence(0, 
>>> maxFieldValueChars).toString());
>>>  } else {
>>> -  detector.append(stringContent);
>>> +  detector.append(stringContent.toString());
>>>  }
>>>  detector.append(" ");
>>>} else {
>>> Index: 
>>> solr/contrib/langid/src/java/org/apache/solr/update/processor/LanguageIdentifierUpdateProcessor.java
>>> IDEA additional info:
>>> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
>>> <+>UTF-8
>>> ===
>>> --- 
>>> solr/contrib/langid/src/java/org/apache/solr/update/processor/LanguageIdentifierUpdateProcessor.java
>>> (revision 8c831daf4eb41153c25ddb152501ab5bae3ea3d5)
>>> +++ 
>>> solr/contrib/langid/src/java/org/apache/solr/update/processor/LanguageIdentifierUpdateProcessor.java
>>> (date 1550217691000)
>>> @@ -413,10 +413,10 @@
>>>  Collection fieldValues = doc.getFieldValues(fieldName);
>>>  if (fieldValues != null) {
>>>for (Object content : fieldValues) {
>>> -if (content instanceof String) {
>>> -  String stringContent = (String) content;
>>> +if (content instanceof CharSequence) {
>>> +  CharSequence stringContent = (CharSequence) content;
>>>if (stringContent.length() > maxFieldValueChars) {
>>> -sb.append(stringContent.substring(0, maxFieldValueChars));
>>> +sb.append(stringContent.subSequence(0, 
>>> maxFieldValueChars));
>>>} else {
>>>  sb.append(stringContent);
>>>}
>>> @@ -449,8 +449,8 @@
>>>  Collection contents = doc.getFieldValues(field);
>>>  if (contents != null) {
>>>for (Object content : contents) {
>>> -if (content instanceof String) {
>>> -  docSize += Math.min(((String) content).length(), 
>>> 

only error logging in solr

2019-02-18 Thread Bernd Fehling

Hi list,

logging in solr sounds easy but the problem is logging only errors
and the request which produced the error.
I want to log all 4xx and 5xx http and also solr ERROR.

My request_logs from jetty show nothing useful because of POST requests.
Only that a request got HTTP 4xx or 5xx from solr.

INFO log level for solr_logs is not used because of to much log writing at high 
QPS.

My solr_logs should report ERRORs the request which produced the ERROR.

Has anyone an idea or solved this problem?

Is it possible to raise the level of a request from INFO to ERROR if
the request produced an ERROR in solr_logs?

Regards
Bernd





Re: Getting repeated Error - RunExecutableListener java.io.IOException

2019-02-18 Thread Jason Gerlowski
Hi Hermant,

configoverlay.json is not a file with content provided by Solr out of
the box.  Instead, it's used to hold any changes you make to Solr's
default configuration using the config API (/config).  More details at
the top of the article here:
https://lucene.apache.org/solr/guide/6_6/config-api.html

So the fact that you see this in your configoverlay.json means that
someone with access to your Solr cluster specifically used the API to
request that RunExecutableListener configuration. (This is true of
everything in configoverlay.json).  Another possibility is that these
settings were requested via the API on a different cluster you had,
and then copied over to your new cluster by whoever set it up.  This
latter possibility could explain why your RunExecutableListener config
seems to be setup to run Linux commands, even though it runs on
Windows.

If you want you can delete configoverlay.json, or create a new
collection without it.  But since these configuration options were
chosen by a cluster admin on your end at some point (and aren't Solr
"defaults"), then your real first concern should be auditing what's in
configoverlay.json and seeing what still makes sense for use in your
cluster.

Hope that helps,

Jason

On Fri, Feb 15, 2019 at 2:05 PM Hemant Verma  wrote:
>
> Thanks Jan
> We are using Solr 6.6.3 version.
> We didn't configure RunExecutableListener in solrconfig.xml, it seems
> configured in configoverlay.json as default. Even we don't want to configure
> RunExecutableListener.
>
> Is it mandatory to use configoverlay.json or can we get rid of it? If yes
> can you share details.
>
> Attached the solrconfig.xml
>
> solrconfig.xml
> 
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Quick Favor?

2019-02-18 Thread gksachin04
Hey,

I just signed the petition "President of India: Campaign to Abolish Article
370" and wanted to see if you could help by adding your name.

Our goal is to reach 50,000 signatures and we need more support. You can
read more and sign the petition here:

http://chng.it/kGdFrQSv4F

Thanks!
Sachin


Re: Createsnapshot null pointer exception

2019-02-18 Thread Jan Høydahl
Hi

You take all the risk by using unsupported features.
A supported way of achieving the same could perhaps be:

1) Create a new empty collection "gatewaycoll" on the nodes you want to 
dedicate as "gateways"
2) Send your queries to the gateway collection but ask for data from the data 
collection "datacoll"
http://some.solr.server:8983/solr/gatewaycoll?q=my query=datacoll

Have not tested it but it should give the desired effect. You can hardcode the 
"collection" param in solrconfig.xml of gatewaycoll's /select handler if you 
wish

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 18. feb. 2019 kl. 08:59 skrev SOLR4189 :
> 
> I think, you don't understand what I mean. 
> 
> 1) I create collection with X shards, each shard has hash range (by CREATE
> collection command)
> 2) I add Y new shards in the same collection, each shard hasn't hash range,
> I call them gateways (by CREATE core command)
> 3) I add LoadBalancer over Y gateways, so all client queries will pass
> through gateways
> 
> In this case, my Y gateways forward queries and merge results only (WITHOUT
> searching in their index) and my X shards search in index only (WITHOUT
> forward queries and merge results). It gives me the best queries
> performance.
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re-read from CloudSolrStream

2019-02-18 Thread SOLR4189
Hi all,

Let's say I have a next code:

http://joelsolr.blogspot.com/2015/04/the-streaming-api-solrjio-basics.html
  

public class StreamingClient {

   public static void main(String args[]) throws IOException {
  String zkHost = args[0];
  String collection = args[1];

  Map props = new HashMap();
  props.put("q", "*:*");
  props.put("qt", "/export");
  props.put("sort", "fieldA asc");
  props.put("fl", "fieldA,fieldB,fieldC");
  
  CloudSolrStream cstream = new CloudSolrStream(zkHost, 
collection, 
props);
  try {
   
cstream.open();
while(true) {
  
  Tuple tuple = cstream.read();
  if(tuple.EOF) {
 break;
  }

  String fieldA = tuple.getString("fieldA");
  String fieldB = tuple.getString("fieldB");
  String fieldC = tuple.getString("fieldC");
  System.out.println(fieldA + ", " + fieldB + ", " + fieldC);
}
  
  } finally {
   cstream.close();
  }
   }
}

What can I do if I get exception in the line *Tuple tuple =
cstream.read();*? How can I re-read the same tuple, i.e. to continue from
exception moment ? 




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Createsnapshot null pointer exception

2019-02-18 Thread SOLR4189
I think, you don't understand what I mean. 

1) I create collection with X shards, each shard has hash range (by CREATE
collection command)
2) I add Y new shards in the same collection, each shard hasn't hash range,
I call them gateways (by CREATE core command)
3) I add LoadBalancer over Y gateways, so all client queries will pass
through gateways

In this case, my Y gateways forward queries and merge results only (WITHOUT
searching in their index) and my X shards search in index only (WITHOUT
forward queries and merge results). It gives me the best queries
performance.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html