Re: Exception while using {!cardinality=1.0}.

2015-08-17 Thread Modassar Ather
Any suggestions please.

Regards,
Modassar

On Thu, Aug 13, 2015 at 4:25 PM, Modassar Ather 
wrote:

> Hi,
>
> I am getting following exception for the query :
> *q=field:query&stats=true&stats.field={!cardinality=1.0}field*. The
> exception is not seen once the cardinality is set to 0.9 or less.
> The field is *docValues enabled* and *indexed=false*. The same exception
> I tried to reproduce on non docValues field but could not. Please help me
> resolve the issue.
>
> ERROR - 2015-08-11 12:24:00.222; [core]
> org.apache.solr.common.SolrException;
> null:java.lang.ArrayIndexOutOfBoundsException: 3
> at
> net.agkn.hll.serialization.BigEndianAscendingWordSerializer.writeWord(BigEndianAscendingWordSerializer.java:152)
> at
> net.agkn.hll.util.BitVector.getRegisterContents(BitVector.java:247)
> at net.agkn.hll.HLL.toBytes(HLL.java:917)
> at net.agkn.hll.HLL.toBytes(HLL.java:869)
> at
> org.apache.solr.handler.component.AbstractStatsValues.getStatsValues(StatsValuesFactory.java:348)
> at
> org.apache.solr.handler.component.StatsComponent.convertToResponse(StatsComponent.java:151)
> at
> org.apache.solr.handler.component.StatsComponent.process(StatsComponent.java:62)
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:255)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
> at
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
> at org.eclipse.jetty.server.Server.handle(Server.java:497)
> at
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
> at
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
> at
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
> at java.lang.Thread.run(Thread.java:745)
>
> Kindly let me know if I need to ask this on any of the related jira issue.
>
> Thanks,
> Modassar
>


Re: Query time out. Solr node goes down.

2015-08-17 Thread Modassar Ather
I tried to profile the memory of each solr node. I can see the GC activity
going higher as much as 98% and there are many instances where it has gone
up at 10+%. In one of the solr node I can see it going to 45%.
Memory is fully used and have gone to the maximum usage of heap which is
set to 24g. During other search I can see the error
*org.apache.solr.common.SolrException: no servers hosting shard.*
Few nodes are in gone state. There are many instances of
*org.apache.solr.common.SolrException:
org.apache.zookeeper.KeeperException$SessionExpiredException.*
GC logs shows a very busy garbage collection.Please provide your inputs.

On Tue, Aug 18, 2015 at 10:38 AM, Modassar Ather 
wrote:

> Shawn! The container I am using is jetty only and the JVM setting I am
> using is the default one which comes with Solr startup scripts. Yes I have
> changed the JVM memory setting as mentioned.
> Kindly help me understand, even if there is a a GC pause why the solr node
> will go down. At least for other queries is should not throw exception of
> *org.apache.solr.common.SolrException: no servers hosting shard.*
> Why the node will throw above exception even a huge query is time out or
> may have taken lot of resources. Kindly help me understand in what
> conditions such exception can arise as I am not fully aware of it.
>
> Daniel! The error logs do not say if it was JVM crash or just solr. But by
> the exception I understand that it might have gone to a state from where it
> recovered after sometime. I did not restart the Solr.
>
> On Mon, Aug 17, 2015 at 10:12 PM, Daniel Collins 
> wrote:
>
>> When you say "the solr node goes down", what do you mean by that? From
>> your
>> comment on the logs, you obviously lose the solr core at best (you do
>> realize only having a single replica is inherently susceptible to failure,
>> right?)
>> But do you mean the Solr Core drops out of the collection (ZK timeout),
>> the
>> JVM stops, the whole machine crashes?
>>
>> On 17 August 2015 at 14:17, Shawn Heisey  wrote:
>>
>> > On 8/17/2015 5:45 AM, Modassar Ather wrote:
>> > > The servers have 32g memory each. Solr JVM memory is set to -Xms20g
>> > > -Xmx24g. There are no OOM in logs.
>> >
>> > Are you starting Solr 5.2.1 with the included start script, or have you
>> > installed it into another container?
>> >
>> > Assuming you're using the download's "bin/solr" script, that will
>> > normally set Xms and Xmx to the same value, so if you have overridden
>> > the memory settings such that you can have different values in Xms and
>> > Xmx, have you also overridden the garbage collection parameters?  If you
>> > have, what are they set to now?  You can see all arguments used on
>> > startup in the "JVM" section of the admin UI dashboard.
>> >
>> > If you've installed in an entirely different container, or you have
>> > overridden the garbage collection settings, then a 24GB heap might have
>> > extreme garbage collection pauses, lasting long enough to exceed the
>> > timeout.
>> >
>> > Giving 24 out of 32GB to Solr will mean that there is only (at most) 8GB
>> > left over for caching the index.  With 200GB of index, this is nowhere
>> > near enough, and is another likely source of Solr performance problems
>> > that cause timeouts.  This is what Upayavira was referring to in his
>> > reply.  For good performance with 200GB of index, you may need a lot
>> > more than 32GB of total RAM.
>> >
>> > https://wiki.apache.org/solr/SolrPerformanceProblems
>> >
>> > This wiki page also describes how you can use jconsole to judge how much
>> > heap you actually need.  24GB may be too much.
>> >
>> > Thanks,
>> > Shawn
>> >
>> >
>>
>
>


RE: Solr Caching (documentCache) not working

2015-08-17 Thread Maulin Rathod
Hi Shawn,



Thanks for your feedback.

In our scenario documents are added frequently (Approx 10 documents added in 1 
minute) and we want to make it available for search near realtime (within 5 
second).  Even if we set  autosoftcommit 5 second (so that document will be 
available for search after 5 second), it flushes all documents from 
documentCache. Just wanted to understand if we are doing something wrong or its 
solr expected behavior.







   5000







Regards,



Maulin







-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org]
Sent: 17 August 2015 19:02
To: solr-user@lucene.apache.org
Subject: Re: Solr Caching (documentCache) not working



On 8/17/2015 7:04 AM, Maulin Rathod wrote:

> We have observed that Intermittently querying become slower when 
> documentCache become empty. The documentCache is getting flushed whenever new 
> document added to the collection.

>

> Is there any way by which we can ensure that newly added documents are 
> visible without losing data in documentCache? We are trying to use soft 
> commit but it also flushes all documents in documentCache.







> 

>   50

> 



You are doing a soft commit within 50 milliseconds of adding a new document.  
Solr can have severe performance problems when autoSoftCommit is set to 1000 -- 
one second.  50 milliseconds is one twentieth of a very low value that is known 
to cause problems.  It can make the problem much more than 20 times worse.



Please read this article:



http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/



Note one particular section, which says the following:  Don’t listen to your 
product manager who says "we need no more than 1 second latency".



You need to set your commit interval as long as you possibly can.  I personally 
wouldn't go longer than 60 seconds, 30 seconds if the commits complete 
particularly fast.  It should be several minutes if that will meet your needs.  
When your commit interval is very low, Solr's caches can become useless, as 
you've noticed.



TL;DR info:  Your autoCommit settings have openSearcher set to false, so they 
do not matter for the problem you have described. I would probably increase 
that to 5 minutes rather than 15 seconds, but that is not very important here, 
and 15 seconds for hard commits that don't open a new searcher is known to have 
a low impact on performance.  "Low" impact isn't the same as NO impact, so I 
keep this interval long as well.



Thanks,

Shawn




Re: Query time out. Solr node goes down.

2015-08-17 Thread Modassar Ather
Shawn! The container I am using is jetty only and the JVM setting I am
using is the default one which comes with Solr startup scripts. Yes I have
changed the JVM memory setting as mentioned.
Kindly help me understand, even if there is a a GC pause why the solr node
will go down. At least for other queries is should not throw exception of
*org.apache.solr.common.SolrException: no servers hosting shard.*
Why the node will throw above exception even a huge query is time out or
may have taken lot of resources. Kindly help me understand in what
conditions such exception can arise as I am not fully aware of it.

Daniel! The error logs do not say if it was JVM crash or just solr. But by
the exception I understand that it might have gone to a state from where it
recovered after sometime. I did not restart the Solr.

On Mon, Aug 17, 2015 at 10:12 PM, Daniel Collins 
wrote:

> When you say "the solr node goes down", what do you mean by that? From your
> comment on the logs, you obviously lose the solr core at best (you do
> realize only having a single replica is inherently susceptible to failure,
> right?)
> But do you mean the Solr Core drops out of the collection (ZK timeout), the
> JVM stops, the whole machine crashes?
>
> On 17 August 2015 at 14:17, Shawn Heisey  wrote:
>
> > On 8/17/2015 5:45 AM, Modassar Ather wrote:
> > > The servers have 32g memory each. Solr JVM memory is set to -Xms20g
> > > -Xmx24g. There are no OOM in logs.
> >
> > Are you starting Solr 5.2.1 with the included start script, or have you
> > installed it into another container?
> >
> > Assuming you're using the download's "bin/solr" script, that will
> > normally set Xms and Xmx to the same value, so if you have overridden
> > the memory settings such that you can have different values in Xms and
> > Xmx, have you also overridden the garbage collection parameters?  If you
> > have, what are they set to now?  You can see all arguments used on
> > startup in the "JVM" section of the admin UI dashboard.
> >
> > If you've installed in an entirely different container, or you have
> > overridden the garbage collection settings, then a 24GB heap might have
> > extreme garbage collection pauses, lasting long enough to exceed the
> > timeout.
> >
> > Giving 24 out of 32GB to Solr will mean that there is only (at most) 8GB
> > left over for caching the index.  With 200GB of index, this is nowhere
> > near enough, and is another likely source of Solr performance problems
> > that cause timeouts.  This is what Upayavira was referring to in his
> > reply.  For good performance with 200GB of index, you may need a lot
> > more than 32GB of total RAM.
> >
> > https://wiki.apache.org/solr/SolrPerformanceProblems
> >
> > This wiki page also describes how you can use jconsole to judge how much
> > heap you actually need.  24GB may be too much.
> >
> > Thanks,
> > Shawn
> >
> >
>


Re: Solr 4.6.1 Cloud Stops Replication

2015-08-17 Thread Rallavagu
By the time the last email was sent, other node also caught up. Makes me 
wonder what happened and how does this work.


Thanks

On 8/17/15 9:53 PM, Rallavagu wrote:

response inline..

On 8/17/15 8:40 PM, Erick Erickson wrote:

Is this 4 shards? Two shards each with a leader and follower? Details
matter a lot


It is a single collection single shard.



What, if anything, is in the log file for the down nodes? I'm assuming
that when you
start, all the nodes are active


During the update process found following exceptions

org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting 
for connection from pool
at 
org.apache.http.impl.conn.PoolingClientConnectionManager.leaseConnection(PoolingClientConnectionManager.java:232)
at 
org.apache.http.impl.conn.PoolingClientConnectionManager$1.getConnection(PoolingClientConnectionManager.java:199)
at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:456)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:232)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)

at java.lang.Thread.run(Thread.java:682)

However, after couple of hours one of the nodes (out of two that were 
trailing) caught up with status "Active". However, other node is still 
in state "Down". It has following message.


"Log replay finished. recoveryInfo=RecoveryInfo{adds=2009581 
deletes=148 deleteByQuery=0 errors=0 positionOfStart=0}"


I am trying to understand the behavior and wondering is there a way to 
"trigger" the updates to other participating nodes in the cloud.


Also, I have noticed that the memory consumption goes very high. For 
instance, each node is configured with 48G memory while java heap is 
configured with 12G. The available physical memory is consumed almost 
46G and the heap size is well within the limits (at this time it is at 
8G). Is there a documentation or to understand this behavior? I 
suspect it could be lucene related memory consumption but not sure.





You might review:
http://wiki.apache.org/solr/UsingMailingLists


Sorry for not being very clear to start with. Hope the provided 
information would help.


Thanks



Best,
Erick

On Mon, Aug 17, 2015 at 6:19 PM, Rallavagu  wrote:

Hello,

Have 4 nodes participating solr cloud. After indexing about 2 mil 
documents,
only two nodes are "Active" (green) while other two are shown as 
"down". How

can I "initialize" the replication from leader so other two nodes would
receive updates?

Thanks




Re: Solr 4.6.1 Cloud Stops Replication

2015-08-17 Thread Rallavagu

response inline..

On 8/17/15 8:40 PM, Erick Erickson wrote:

Is this 4 shards? Two shards each with a leader and follower? Details
matter a lot


It is a single collection single shard.



What, if anything, is in the log file for the down nodes? I'm assuming
that when you
start, all the nodes are active


During the update process found following exceptions

org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for 
connection from pool
	at 
org.apache.http.impl.conn.PoolingClientConnectionManager.leaseConnection(PoolingClientConnectionManager.java:232)
	at 
org.apache.http.impl.conn.PoolingClientConnectionManager$1.getConnection(PoolingClientConnectionManager.java:199)
	at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:456)
	at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
	at 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:232)
	at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)

at java.lang.Thread.run(Thread.java:682)

However, after couple of hours one of the nodes (out of two that were 
trailing) caught up with status "Active". However, other node is still 
in state "Down". It has following message.


"Log replay finished. recoveryInfo=RecoveryInfo{adds=2009581 deletes=148 
deleteByQuery=0 errors=0 positionOfStart=0}"


I am trying to understand the behavior and wondering is there a way to 
"trigger" the updates to other participating nodes in the cloud.


Also, I have noticed that the memory consumption goes very high. For 
instance, each node is configured with 48G memory while java heap is 
configured with 12G. The available physical memory is consumed almost 
46G and the heap size is well within the limits (at this time it is at 
8G). Is there a documentation or to understand this behavior? I suspect 
it could be lucene related memory consumption but not sure.





You might review:
http://wiki.apache.org/solr/UsingMailingLists


Sorry for not being very clear to start with. Hope the provided 
information would help.


Thanks



Best,
Erick

On Mon, Aug 17, 2015 at 6:19 PM, Rallavagu  wrote:

Hello,

Have 4 nodes participating solr cloud. After indexing about 2 mil documents,
only two nodes are "Active" (green) while other two are shown as "down". How
can I "initialize" the replication from leader so other two nodes would
receive updates?

Thanks


Re: Solr 4.6.1 Cloud Stops Replication

2015-08-17 Thread Erick Erickson
Is this 4 shards? Two shards each with a leader and follower? Details
matter a lot

What, if anything, is in the log file for the down nodes? I'm assuming
that when you
start, all the nodes are active

You might review:
http://wiki.apache.org/solr/UsingMailingLists

Best,
Erick

On Mon, Aug 17, 2015 at 6:19 PM, Rallavagu  wrote:
> Hello,
>
> Have 4 nodes participating solr cloud. After indexing about 2 mil documents,
> only two nodes are "Active" (green) while other two are shown as "down". How
> can I "initialize" the replication from leader so other two nodes would
> receive updates?
>
> Thanks


Re: SolrCloud Shard Order & Hash Keys

2015-08-17 Thread Sathiya N Sundararajan
Great thanks Yonik.

On Mon, Aug 17, 2015 at 5:16 PM, Yonik Seeley  wrote:

> On Mon, Aug 17, 2015 at 8:00 PM, Sathiya N Sundararajan
>  wrote:
> > Folks:
> >
> > Question regarding SolrCloud Shard Number (Ex: shard) & associated
> hash
> > ranges. We are in the process of identifying the best strategy to merge
> > shards that belong to collections that are chronologically older which
> sees
> > very low volume of searches compared to the collections with most recent
> > data.
> >
> > What we ran into is that often times we find that Shard numbers are hash
> > ranges don’t necessarily correlate:
> >
> > shard1: 8000-aaa9
> > shard2: -d554
> > shard3: d555- ( holds the last range )
> > shard4: 0-2aa9 ( holds the starting range )
> > shard5: 2aaa-5554
> > shard6: -7fff
>
>
> It's not really clear what you mean by "correlate"... but I think
> there are 2 different points to make:
> 1) This is the hex representation of a signed integer, so 8000 is
> the start of the complete hash range, and 7fff is the end.
> 2) The numbers in shard1, shard2, etc, names are meaningless... just
> names like shard_foo and shard_bar.  They do not need to be ordered in
> any way with respect to each other.
>
> -Yonik
>
> > same goes for 'core_node’ that does not follow order neither it
> > correlates with shard. Meaning core_node<1> does not contain the keys
> > starting from 0 nor does it map to shard<1>.
> >
> > {"shard1"=>
> >   {"range"=>"8000-aaa9",
> > {"core_node5"=>
> >   "core"=>"post_NW_201508_shard1_replica1",
> >   "shard2"=>
> > {"range"=>"-d554",
> >   {"core_node6"=>
> > "core"=>"post_NW_201508_shard2_replica1",
> >   "shard3"=>
> > {"range"=>"d555-",
> >   {"core_node2"=>
> > "core"=>"post_NW_201508_shard3_replica1",
> >   "shard4"=>
> > {"range"=>"0-2aa9",
> >   {"core_node3"=>
> > "core"=>"post_NW_201508_shard4_replica1",
> >   "shard5"=>
> > {"range"=>"2aaa-5554",
> >   {"core_node4"=>
> > "core"=>"post_NW_201508_shard5_replica1",
> >   "shard6"=>
> > {"range"=>"-7fff",
> >   {"core_node1"=>
> > "core"=>"post_NW_201508_shard6_replica1"
> >
> >
> > Why would this be a concern ?
> >
> >1. Lets say if we merge the indexes of adjacent shards (to reduce the
> >number of shards in the collection). In this case it will be merging
> >"core_node3: 0-2aa9” & "core_node4: 2aaa-5554” . What
> would the
> >index of the new core_node directory ? core_node
> >2. When we copy this data over to the cluster after recreating the
> >collection with reduced number of shards, how would the cluster infer
> the
> >hash range from the index data or how does it reconcile with the
> metadata
> >about the shards in the local filesystem of cluster nodes.
> >3. How should we approach this problem to guarantee Solr picks up the
> >right key order from the merged indexes ?
> >
> >
> >
> > *Solr 4.4*
> > *HDFS for Index Storage*
>


Solr 4.6.1 Cloud Stops Replication

2015-08-17 Thread Rallavagu

Hello,

Have 4 nodes participating solr cloud. After indexing about 2 mil 
documents, only two nodes are "Active" (green) while other two are shown 
as "down". How can I "initialize" the replication from leader so other 
two nodes would receive updates?


Thanks


Re: SolrCloud Shard Order & Hash Keys

2015-08-17 Thread Yonik Seeley
On Mon, Aug 17, 2015 at 8:00 PM, Sathiya N Sundararajan
 wrote:
> Folks:
>
> Question regarding SolrCloud Shard Number (Ex: shard) & associated hash
> ranges. We are in the process of identifying the best strategy to merge
> shards that belong to collections that are chronologically older which sees
> very low volume of searches compared to the collections with most recent
> data.
>
> What we ran into is that often times we find that Shard numbers are hash
> ranges don’t necessarily correlate:
>
> shard1: 8000-aaa9
> shard2: -d554
> shard3: d555- ( holds the last range )
> shard4: 0-2aa9 ( holds the starting range )
> shard5: 2aaa-5554
> shard6: -7fff


It's not really clear what you mean by "correlate"... but I think
there are 2 different points to make:
1) This is the hex representation of a signed integer, so 8000 is
the start of the complete hash range, and 7fff is the end.
2) The numbers in shard1, shard2, etc, names are meaningless... just
names like shard_foo and shard_bar.  They do not need to be ordered in
any way with respect to each other.

-Yonik

> same goes for 'core_node’ that does not follow order neither it
> correlates with shard. Meaning core_node<1> does not contain the keys
> starting from 0 nor does it map to shard<1>.
>
> {"shard1"=>
>   {"range"=>"8000-aaa9",
> {"core_node5"=>
>   "core"=>"post_NW_201508_shard1_replica1",
>   "shard2"=>
> {"range"=>"-d554",
>   {"core_node6"=>
> "core"=>"post_NW_201508_shard2_replica1",
>   "shard3"=>
> {"range"=>"d555-",
>   {"core_node2"=>
> "core"=>"post_NW_201508_shard3_replica1",
>   "shard4"=>
> {"range"=>"0-2aa9",
>   {"core_node3"=>
> "core"=>"post_NW_201508_shard4_replica1",
>   "shard5"=>
> {"range"=>"2aaa-5554",
>   {"core_node4"=>
> "core"=>"post_NW_201508_shard5_replica1",
>   "shard6"=>
> {"range"=>"-7fff",
>   {"core_node1"=>
> "core"=>"post_NW_201508_shard6_replica1"
>
>
> Why would this be a concern ?
>
>1. Lets say if we merge the indexes of adjacent shards (to reduce the
>number of shards in the collection). In this case it will be merging
>"core_node3: 0-2aa9” & "core_node4: 2aaa-5554” . What would the
>index of the new core_node directory ? core_node
>2. When we copy this data over to the cluster after recreating the
>collection with reduced number of shards, how would the cluster infer the
>hash range from the index data or how does it reconcile with the metadata
>about the shards in the local filesystem of cluster nodes.
>3. How should we approach this problem to guarantee Solr picks up the
>right key order from the merged indexes ?
>
>
>
> *Solr 4.4*
> *HDFS for Index Storage*


SolrCloud Shard Order & Hash Keys

2015-08-17 Thread Sathiya N Sundararajan
Folks:

Question regarding SolrCloud Shard Number (Ex: shard) & associated hash
ranges. We are in the process of identifying the best strategy to merge
shards that belong to collections that are chronologically older which sees
very low volume of searches compared to the collections with most recent
data.

What we ran into is that often times we find that Shard numbers are hash
ranges don’t necessarily correlate:

shard1: 8000-aaa9
shard2: -d554
shard3: d555- ( holds the last range )
shard4: 0-2aa9 ( holds the starting range )
shard5: 2aaa-5554
shard6: -7fff


same goes for 'core_node’ that does not follow order neither it
correlates with shard. Meaning core_node<1> does not contain the keys
starting from 0 nor does it map to shard<1>.

{"shard1"=>
  {"range"=>"8000-aaa9",
{"core_node5"=>
  "core"=>"post_NW_201508_shard1_replica1",
  "shard2"=>
{"range"=>"-d554",
  {"core_node6"=>
"core"=>"post_NW_201508_shard2_replica1",
  "shard3"=>
{"range"=>"d555-",
  {"core_node2"=>
"core"=>"post_NW_201508_shard3_replica1",
  "shard4"=>
{"range"=>"0-2aa9",
  {"core_node3"=>
"core"=>"post_NW_201508_shard4_replica1",
  "shard5"=>
{"range"=>"2aaa-5554",
  {"core_node4"=>
"core"=>"post_NW_201508_shard5_replica1",
  "shard6"=>
{"range"=>"-7fff",
  {"core_node1"=>
"core"=>"post_NW_201508_shard6_replica1"


Why would this be a concern ?

   1. Lets say if we merge the indexes of adjacent shards (to reduce the
   number of shards in the collection). In this case it will be merging
   "core_node3: 0-2aa9” & "core_node4: 2aaa-5554” . What would the
   index of the new core_node directory ? core_node
   2. When we copy this data over to the cluster after recreating the
   collection with reduced number of shards, how would the cluster infer the
   hash range from the index data or how does it reconcile with the metadata
   about the shards in the local filesystem of cluster nodes.
   3. How should we approach this problem to guarantee Solr picks up the
   right key order from the merged indexes ?



*Solr 4.4*
*HDFS for Index Storage*


Re: Solr indexing based on last_modified

2015-08-17 Thread Erick Erickson
Well, you'll have to have some kind of timestamp that you can
reference and only re-send
files that have a newer timestamp. Or keep a DB around with file
path/last indexed timestamp
or

Best,
Erick

On Mon, Aug 17, 2015 at 12:36 PM, coolmals  wrote:
> I have a file system. I have a scheduler which will call solr in scheduled
> time interval. Any updates to the file system must be indexed by solr. Only
> changes must be re-indexed as file system is huge and cannot be re-indexed
> every time.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-indexing-based-on-last-modified-tp4223506p4223511.html
> Sent from the Solr - User mailing list archive at Nabble.com.


RE: SOLR to pivot on date range query

2015-08-17 Thread Lewin Joy (TMS)
Hi Yonik,

Thank you for the reply. I followed your link and this feature is really 
awesome to have.
But, unfortunately I am using solr 4.4 on cloudera right now. 
I tried this. Looks like it does not work for this version.
Sorry, I forgot to mention that in my original mail.

Thanks,
Lewin

-Original Message-
From: Yonik Seeley [mailto:ysee...@gmail.com] 
Sent: Monday, August 17, 2015 12:26 PM
To: solr-user@lucene.apache.org
Subject: Re: SOLR to pivot on date range query

The JSON Facet API can embed any type of facet within any other type:
http://yonik.com/json-facet-api/

json.facet={
  dates : {
type : range,
field : entryDate,
start : "2001-...",  // use full solr date format
end : "2015...",
gap : "+1MONTH",
facet : {
  type:terms,
  field:entryType
}
  }
}

-Yonik


On Mon, Aug 17, 2015 at 3:16 PM, Lewin Joy (TMS)  wrote:
> Hi,
>
> I have data that is coming in everyday. I need to query the index for a time 
> range and give the facet counts ordered by different months.
> For this, I just have a solr date field, entryDate which captures the time.
>
> How do I make this query? I need the results like below.
>
> Jan-2015 (2000)
> entryType=Sales(750)
> entryType=Complaints(200)
> entryType=Feedback(450)
> Feb-2015(3200)
> entryType=Sales(1000)
> entryType=Complaints(250)
> entryType=Feedback(600)
> Mar-2015(2800)
> entryType=Sales(980)
> entryType=Complaints(220)
> entryType=Feedback(400)
>
>
> I tried Range queries on 'entryDate' field to order the result facets by 
> month.
> But, I am not able to pivot on the 'entryType' field to bring the counts of 
> "sales,complaints and feedback" type record by month.
>
> For now, I am creating another field at index time to have the value for 
> "MONTH-YEAR" derived from the 'entryDate' field.
> But for older records, it becomes a hassle. Is there a way I can handle this 
> at query time?
> Or is there a better way to handle this situation?
>
> Please let me know. Any thoughts / suggestions are valuable.
>
> Thanks,
> Lewin
>


Re: Solr Caching (documentCache) not working

2015-08-17 Thread Yonik Seeley
On Mon, Aug 17, 2015 at 4:36 PM, Daniel Collins  wrote:
> we had to turn off
> ALL the Solr caches (warming is useless at that kind of frequency

Warming and caching are related, but different.  Caching still
normally makes sense without warming, and Solr is generally written
with the assumption that caches are present.

-Yonik


Re: Solr Caching (documentCache) not working

2015-08-17 Thread Mikhail Khludnev
On Mon, Aug 17, 2015 at 11:36 PM, Daniel Collins 
wrote:

> Just to open the can of worms, it *can* be possible to have very low commit
> times, we have 250ms currently and are in production with that.  But it
> does come with pain (no such thing as a free lunch!), we had to turn off
> ALL the Solr caches


Gentlemen,
Excuse me for hijacking, here are the small "segmentation" lunchbox:
 - segmented filters, which much cheaper for commit:
http://blog.griddynamics.com/2014/01/segmented-filter-cache-in-solr.html
 - docvalues facets on steroids
https://issues.apache.org/jira/browse/SOLR-7730 (since 5.3)
But document's cache wasn't sliced yet. Thus, I prefer
https://issues.apache.org/jira/browse/SOLR-7937 hangs for a while and to be
pursued later.


> (warming is useless at that kind of frequency, it will
> take longer to warm the cache than the time before the next commit), and
> throw a lot of RAM and expensive SSDs at the problem.
>
> That said, Shawn's advice is correct, anything less than 1s commit
> shouldn't be needed for most users, and I would concur with staying away
> from it unless you absolutely decide you have to have it.
>
> You only go that route if you are prepared to commit (no pun intended!) a
> fair amount of time, money and resources to investigating and dealing with
> issues.  We will have a talk at Revolution this year about some of the
> scale and latency issues we have to deal with (blatant plug for my team
> lead who's giving the talk!)
>
> On 17 August 2015 at 14:31, Shawn Heisey  wrote:
>
> > On 8/17/2015 7:04 AM, Maulin Rathod wrote:
> > > We have observed that Intermittently querying become slower when
> > documentCache become empty. The documentCache is getting flushed whenever
> > new document added to the collection.
> > >
> > > Is there any way by which we can ensure that newly added documents are
> > visible without losing data in documentCache? We are trying to use soft
> > commit but it also flushes all documents in documentCache.
> >
> > 
> >
> > > 
> > >   50
> > > 
> >
> > You are doing a soft commit within 50 milliseconds of adding a new
> > document.  Solr can have severe performance problems when autoSoftCommit
> > is set to 1000 -- one second.  50 milliseconds is one twentieth of a
> > very low value that is known to cause problems.  It can make the problem
> > much more than 20 times worse.
> >
> > Please read this article:
> >
> >
> >
> http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
> >
> > Note one particular section, which says the following:  Don’t listen to
> > your product manager who says "we need no more than 1 second latency".
> >
> > You need to set your commit interval as long as you possibly can.  I
> > personally wouldn't go longer than 60 seconds, 30 seconds if the commits
> > complete particularly fast.  It should be several minutes if that will
> > meet your needs.  When your commit interval is very low, Solr's caches
> > can become useless, as you've noticed.
> >
> > TL;DR info:  Your autoCommit settings have openSearcher set to false, so
> > they do not matter for the problem you have described. I would probably
> > increase that to 5 minutes rather than 15 seconds, but that is not very
> > important here, and 15 seconds for hard commits that don't open a new
> > searcher is known to have a low impact on performance.  "Low" impact
> > isn't the same as NO impact, so I keep this interval long as well.
> >
> > Thanks,
> > Shawn
> >
> >
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: Solr Caching (documentCache) not working

2015-08-17 Thread Daniel Collins
Just to open the can of worms, it *can* be possible to have very low commit
times, we have 250ms currently and are in production with that.  But it
does come with pain (no such thing as a free lunch!), we had to turn off
ALL the Solr caches (warming is useless at that kind of frequency, it will
take longer to warm the cache than the time before the next commit), and
throw a lot of RAM and expensive SSDs at the problem.

That said, Shawn's advice is correct, anything less than 1s commit
shouldn't be needed for most users, and I would concur with staying away
from it unless you absolutely decide you have to have it.

You only go that route if you are prepared to commit (no pun intended!) a
fair amount of time, money and resources to investigating and dealing with
issues.  We will have a talk at Revolution this year about some of the
scale and latency issues we have to deal with (blatant plug for my team
lead who's giving the talk!)

On 17 August 2015 at 14:31, Shawn Heisey  wrote:

> On 8/17/2015 7:04 AM, Maulin Rathod wrote:
> > We have observed that Intermittently querying become slower when
> documentCache become empty. The documentCache is getting flushed whenever
> new document added to the collection.
> >
> > Is there any way by which we can ensure that newly added documents are
> visible without losing data in documentCache? We are trying to use soft
> commit but it also flushes all documents in documentCache.
>
> 
>
> > 
> >   50
> > 
>
> You are doing a soft commit within 50 milliseconds of adding a new
> document.  Solr can have severe performance problems when autoSoftCommit
> is set to 1000 -- one second.  50 milliseconds is one twentieth of a
> very low value that is known to cause problems.  It can make the problem
> much more than 20 times worse.
>
> Please read this article:
>
>
> http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>
> Note one particular section, which says the following:  Don’t listen to
> your product manager who says "we need no more than 1 second latency".
>
> You need to set your commit interval as long as you possibly can.  I
> personally wouldn't go longer than 60 seconds, 30 seconds if the commits
> complete particularly fast.  It should be several minutes if that will
> meet your needs.  When your commit interval is very low, Solr's caches
> can become useless, as you've noticed.
>
> TL;DR info:  Your autoCommit settings have openSearcher set to false, so
> they do not matter for the problem you have described. I would probably
> increase that to 5 minutes rather than 15 seconds, but that is not very
> important here, and 15 seconds for hard commits that don't open a new
> searcher is known to have a low impact on performance.  "Low" impact
> isn't the same as NO impact, so I keep this interval long as well.
>
> Thanks,
> Shawn
>
>


Re: Solr indexing based on last_modified

2015-08-17 Thread coolmals
I have a file system. I have a scheduler which will call solr in scheduled
time interval. Any updates to the file system must be indexed by solr. Only
changes must be re-indexed as file system is huge and cannot be re-indexed
every time.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-indexing-based-on-last-modified-tp4223506p4223511.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr indexing based on last_modified

2015-08-17 Thread Erick Erickson
There's no way that I know of with post.jar. Post.jar was never really intended
as a production tool, and sending all the files to Solr for parsing (pdf, word
and the like) is putting quite a load on the Solr server.

What is your use-case? You might consider a SolrJ program, it would be
simple enough to pass it a timestamp and only parse/send docs to solr
if the date was more recent. Here's an example (no timestamp
processing though).

https://lucidworks.com/blog/indexing-with-solrj/

Best,
Erick

On Mon, Aug 17, 2015 at 12:21 PM, coolmals  wrote:
> I want to update the index of a file only if last_modified has changed in the
> file. I am running post.jar with fileTypes="*", i would want to update the
> index of the files only if there is any change in them since the last update
> of index. Can you let me know how to achieve this?
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-indexing-based-on-last-modified-tp4223506.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR cloud node has a 4k index directory

2015-08-17 Thread Erick Erickson
This will certainly force-replicate the entire index but I
question whether it was necessary.

These are basically snapshots that are different versions and
do not necessarily have any correlation to the _current_ index.
If you're not having any search problems, i.e. if all the replicas
are returning the same number of documents, you shouldn't need
to be this drastic.

A quick check is to issue a q=*:*&distrib=false against each member
of a shard. If the numFound is identical, then your shards are in sync.

You really can't tell much of anything from even the segment counts
or file names in the index directory on replicas in a shard. Since Solr
forwards the _raw_ document to each replica, and since most commits
are time-based, they'll often vary in terms of segments. Then the
merge policy kicks in and since the segments on one replica may be
slightly different from another, different segments may be chosen to
merge

Other than seeing all the directories laying around, what was the problem?
Was there an operational issue like differing numbers of docs found for
identical queries or anything else strange besides just more directories
than you expected?

FWIW,
Erick

On Mon, Aug 17, 2015 at 11:54 AM, Jeff Courtade  wrote:
> Hi,
>
> So it turns out that the index directory has nothign to do with what index
> is actually in use
>
> I found that we had mismatched version numbers on our shards so this is
> what we had to do to fix that.
>
>
> In production today we discovered that our shard replicas were on different
> version numbers.
> this means that the shards had some differences in data between leader and
> replica.
>
> we have two shards
>
> shard1 ps01 ps03
> shard2 ps02 ps04
>
> checking this url showed different version numbers on a given shard. Both
> leader and replica is a shard should have the same version number.
>
>
> http://ps0X:8983/solr/#/~cores/collection1
>
> shard1 ps01 7752045 ps03 7752095
>
> shard2 ps02 7792045 ps04 7790323
>
> So to fix this we did the following.
>
> Stop ingestion/aspire no updates should be being made while you are doing
> this.
>
>
> For each shard stop the server with the lowest version number
> In this case it is shard1 ps01 shard2 ps04
>
> so stop solr on ps01
> ps03 will become leader if it is not already in the cloud console
>
> http://ps0X:8983/solr/#/~cloud
>
> then move or remove everything in this directory. It should be empty.
>
>
> /opt/solr/solr-4.7.2/example/solr/collection1/data/
>
> restart solr on ps01
>
> watch that data directory it should get a few files and an
> index.201508X directory where the index is downloaded form the leader
>
> du -sh should show that growing.
>
> in the cloud console ps01 will show as recovering while this is going on
> until it is complete. Once it is done it will go green in teh cloud console.
>
> once it is green check the version number on ps01 and ps03 they should be
> the same now.
>
> Repeat this for shard2 and you are done.
>
>
> --
> Thanks,
>
> Jeff Courtade
> M: 240.507.6116
>
> On Mon, Aug 17, 2015 at 10:57 AM, Jeff Courtade 
> wrote:
>
>> Hi,
>>
>> I have SOLR cloud running on SOLR 4.7.2
>> 2 shards one replica each.
>>
>> The size of the index directories is odd on ps03
>>
>>
>> ps01 shard1 leader
>>
>> 41G
>> /opt/solr/solr-4.7.2/example/solr/collection1/data/index.20150815024352580
>>
>>
>> ps03 shard 2 replica
>>
>> 59G
>> /opt/solr/solr-4.7.2/example/solr/collection1/data/index.20140906125148419
>> 18G
>> /opt/solr/solr-4.7.2/example/solr/collection1/data/index.20150129181248396
>> 24G
>> /opt/solr/solr-4.7.2/example/solr/collection1/data/index.20150511233039042
>> 24G
>> /opt/solr/solr-4.7.2/example/solr/collection1/data/index.20150527121503268
>> 41G
>> /opt/solr/solr-4.7.2/example/solr/collection1/data/index.20150806034052366
>> 4.0K
>>  /opt/solr/solr-4.7.2/example/solr/collection1/data/index.20150814152030017
>>
>>
>> ps02 shard 2 leader
>>
>> 31G
>> /opt/solr/solr-4.7.2/example/solr/collection1/data/index.20150527161148429
>> 39G
>> /opt/solr/solr-4.7.2/example/solr/collection1/data/index.20150815151640598
>>
>>
>> ps04 shard 2 replica
>>
>> 61G
>> /opt/solr/solr-4.7.2/example/solr/collection1/data/index.20140820212651780
>> 39G
>> /opt/solr/solr-4.7.2/example/solr/collection1/data/index.20150815170546642
>>
>>
>> what can i do to remedy this?
>>
>> --
>> Thanks,
>>
>> Jeff Courtade
>> M: 240.507.6116
>>


Re: SOLR to pivot on date range query

2015-08-17 Thread Yonik Seeley
The JSON Facet API can embed any type of facet within any other type:
http://yonik.com/json-facet-api/

json.facet={
  dates : {
type : range,
field : entryDate,
start : "2001-...",  // use full solr date format
end : "2015...",
gap : "+1MONTH",
facet : {
  type:terms,
  field:entryType
}
  }
}

-Yonik


On Mon, Aug 17, 2015 at 3:16 PM, Lewin Joy (TMS)  wrote:
> Hi,
>
> I have data that is coming in everyday. I need to query the index for a time 
> range and give the facet counts ordered by different months.
> For this, I just have a solr date field, entryDate which captures the time.
>
> How do I make this query? I need the results like below.
>
> Jan-2015 (2000)
> entryType=Sales(750)
> entryType=Complaints(200)
> entryType=Feedback(450)
> Feb-2015(3200)
> entryType=Sales(1000)
> entryType=Complaints(250)
> entryType=Feedback(600)
> Mar-2015(2800)
> entryType=Sales(980)
> entryType=Complaints(220)
> entryType=Feedback(400)
>
>
> I tried Range queries on 'entryDate' field to order the result facets by 
> month.
> But, I am not able to pivot on the 'entryType' field to bring the counts of 
> "sales,complaints and feedback" type record by month.
>
> For now, I am creating another field at index time to have the value for 
> "MONTH-YEAR" derived from the 'entryDate' field.
> But for older records, it becomes a hassle. Is there a way I can handle this 
> at query time?
> Or is there a better way to handle this situation?
>
> Please let me know. Any thoughts / suggestions are valuable.
>
> Thanks,
> Lewin
>


Solr indexing based on last_modified

2015-08-17 Thread coolmals
I want to update the index of a file only if last_modified has changed in the
file. I am running post.jar with fileTypes="*", i would want to update the
index of the files only if there is any change in them since the last update
of index. Can you let me know how to achieve this?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-indexing-based-on-last-modified-tp4223506.html
Sent from the Solr - User mailing list archive at Nabble.com.


SOLR to pivot on date range query

2015-08-17 Thread Lewin Joy (TMS)
Hi,

I have data that is coming in everyday. I need to query the index for a time 
range and give the facet counts ordered by different months.
For this, I just have a solr date field, entryDate which captures the time.

How do I make this query? I need the results like below.

Jan-2015 (2000)
entryType=Sales(750)
entryType=Complaints(200)
entryType=Feedback(450)
Feb-2015(3200)
entryType=Sales(1000)
entryType=Complaints(250)
entryType=Feedback(600)
Mar-2015(2800)
entryType=Sales(980)
entryType=Complaints(220)
entryType=Feedback(400)

 
I tried Range queries on 'entryDate' field to order the result facets by month. 
But, I am not able to pivot on the 'entryType' field to bring the counts of 
"sales,complaints and feedback" type record by month.

For now, I am creating another field at index time to have the value for 
"MONTH-YEAR" derived from the 'entryDate' field.
But for older records, it becomes a hassle. Is there a way I can handle this at 
query time? 
Or is there a better way to handle this situation?

Please let me know. Any thoughts / suggestions are valuable.

Thanks,
Lewin



Re: SOLR cloud node has a 4k index directory

2015-08-17 Thread Jeff Courtade
Hi,

So it turns out that the index directory has nothign to do with what index
is actually in use

I found that we had mismatched version numbers on our shards so this is
what we had to do to fix that.


In production today we discovered that our shard replicas were on different
version numbers.
this means that the shards had some differences in data between leader and
replica.

we have two shards

shard1 ps01 ps03
shard2 ps02 ps04

checking this url showed different version numbers on a given shard. Both
leader and replica is a shard should have the same version number.


http://ps0X:8983/solr/#/~cores/collection1

shard1 ps01 7752045 ps03 7752095

shard2 ps02 7792045 ps04 7790323

So to fix this we did the following.

Stop ingestion/aspire no updates should be being made while you are doing
this.


For each shard stop the server with the lowest version number
In this case it is shard1 ps01 shard2 ps04

so stop solr on ps01
ps03 will become leader if it is not already in the cloud console

http://ps0X:8983/solr/#/~cloud

then move or remove everything in this directory. It should be empty.


/opt/solr/solr-4.7.2/example/solr/collection1/data/

restart solr on ps01

watch that data directory it should get a few files and an
index.201508X directory where the index is downloaded form the leader

du -sh should show that growing.

in the cloud console ps01 will show as recovering while this is going on
until it is complete. Once it is done it will go green in teh cloud console.

once it is green check the version number on ps01 and ps03 they should be
the same now.

Repeat this for shard2 and you are done.


--
Thanks,

Jeff Courtade
M: 240.507.6116

On Mon, Aug 17, 2015 at 10:57 AM, Jeff Courtade 
wrote:

> Hi,
>
> I have SOLR cloud running on SOLR 4.7.2
> 2 shards one replica each.
>
> The size of the index directories is odd on ps03
>
>
> ps01 shard1 leader
>
> 41G
> /opt/solr/solr-4.7.2/example/solr/collection1/data/index.20150815024352580
>
>
> ps03 shard 2 replica
>
> 59G
> /opt/solr/solr-4.7.2/example/solr/collection1/data/index.20140906125148419
> 18G
> /opt/solr/solr-4.7.2/example/solr/collection1/data/index.20150129181248396
> 24G
> /opt/solr/solr-4.7.2/example/solr/collection1/data/index.20150511233039042
> 24G
> /opt/solr/solr-4.7.2/example/solr/collection1/data/index.20150527121503268
> 41G
> /opt/solr/solr-4.7.2/example/solr/collection1/data/index.20150806034052366
> 4.0K
>  /opt/solr/solr-4.7.2/example/solr/collection1/data/index.20150814152030017
>
>
> ps02 shard 2 leader
>
> 31G
> /opt/solr/solr-4.7.2/example/solr/collection1/data/index.20150527161148429
> 39G
> /opt/solr/solr-4.7.2/example/solr/collection1/data/index.20150815151640598
>
>
> ps04 shard 2 replica
>
> 61G
> /opt/solr/solr-4.7.2/example/solr/collection1/data/index.20140820212651780
> 39G
> /opt/solr/solr-4.7.2/example/solr/collection1/data/index.20150815170546642
>
>
> what can i do to remedy this?
>
> --
> Thanks,
>
> Jeff Courtade
> M: 240.507.6116
>


Re: joins

2015-08-17 Thread Upayavira
The question as I read it was composite documents, not cross-collection
joins.

If the joined core is small enough to be replicated across all replicas
of your main collection, then cross-core joining works well, as it is
all within one instance.

As to composite documents, I have sometimes wondered whether a
SearchComponent (such as the ExpandComponent) or a DocTransformer could
facilitate this. Once the 10 documents that are to be returned to the
user have been suggested, retro-fitting the joined documents to them
shouldn't be a huge amount of work, performance-wise.

What it'd take, though, is a reasonable amount of thinking to get said
components right.

Upayavira

On Mon, Aug 17, 2015, at 05:19 PM, Erick Erickson wrote:
> True, I haven't looked at it closely. Not sure where it is in the
> priority list though.
> 
> However, I would recommend you _really_ look at _why_  you
> think you need cross-collection joins. Likely they will be expensive,
> and whether they're performant in your situation will be a question.
> 
> If at all possible, Solr is much happier/more flexible if you can
> denormalize your data and live with the fact that your index
> will be bigger.
> 
> Best,
> Erick
> 
> On Sun, Aug 16, 2015 at 12:59 PM, naga sharathrayapati
>  wrote:
> > https://issues.apache.org/jira/browse/SOLR-7090
> >
> > I see this jira open in support of joins which might solve the problem.
> >
> > On Sun, Aug 16, 2015 at 2:51 PM, Erick Erickson 
> > wrote:
> >
> >> bq: Is there any chance of this feature(merge the results to create a
> >> composite
> >> document) coming out in the next release 5.3
> >>
> >> In a word "no". And there aren't really any long-range plans either that
> >> I'm
> >> aware of.
> >>
> >> You could also explore streaming aggregation, if the need here is more
> >> batch-oriented.
> >>
> >> If at all possible, Solr is much more flexible if you can de-normlize your
> >> data
> >> rather than try to make Solr work like an RDBMS. Of course it goes against
> >> the training of all DB Admins, but it's often the best option.
> >>
> >> So have you explored denormalizing and do you know it's not a viable
> >> option?
> >>
> >> Best,
> >> Erick
> >>
> >> On Sun, Aug 16, 2015 at 12:45 PM, naga sharathrayapati
> >>  wrote:
> >> > Is there any chance of this feature(merge the results to create a
> >> composite
> >> > document) coming out in the next release 5.3 ?
> >> >
> >> > On Sun, Aug 16, 2015 at 2:08 PM, Upayavira  wrote:
> >> >
> >> >> You can do what are called "pseudo joins", which are eqivalent to a
> >> >> nested query in SQL. You get back data from one core, based upon
> >> >> criteria in the other. You cannot (yet) merge the results to create a
> >> >> composite document.
> >> >>
> >> >> Upayavira
> >> >>
> >> >> On Sun, Aug 16, 2015, at 06:02 PM, Nagasharath wrote:
> >> >> > I exactly have the same requirement
> >> >> >
> >> >> >
> >> >> >
> >> >> > > On 13-Aug-2015, at 2:12 pm, Kiran Sai Veerubhotla <
> >> sai.sq...@gmail.com>
> >> >> wrote:
> >> >> > >
> >> >> > > does solr support joins?
> >> >> > >
> >> >> > > we have a use case where two collections have to be joined and the
> >> >> join has
> >> >> > > to be on the faceted results of the two collections. is this
> >> possible?
> >> >>
> >>


Re: Query time out. Solr node goes down.

2015-08-17 Thread Daniel Collins
When you say "the solr node goes down", what do you mean by that? From your
comment on the logs, you obviously lose the solr core at best (you do
realize only having a single replica is inherently susceptible to failure,
right?)
But do you mean the Solr Core drops out of the collection (ZK timeout), the
JVM stops, the whole machine crashes?

On 17 August 2015 at 14:17, Shawn Heisey  wrote:

> On 8/17/2015 5:45 AM, Modassar Ather wrote:
> > The servers have 32g memory each. Solr JVM memory is set to -Xms20g
> > -Xmx24g. There are no OOM in logs.
>
> Are you starting Solr 5.2.1 with the included start script, or have you
> installed it into another container?
>
> Assuming you're using the download's "bin/solr" script, that will
> normally set Xms and Xmx to the same value, so if you have overridden
> the memory settings such that you can have different values in Xms and
> Xmx, have you also overridden the garbage collection parameters?  If you
> have, what are they set to now?  You can see all arguments used on
> startup in the "JVM" section of the admin UI dashboard.
>
> If you've installed in an entirely different container, or you have
> overridden the garbage collection settings, then a 24GB heap might have
> extreme garbage collection pauses, lasting long enough to exceed the
> timeout.
>
> Giving 24 out of 32GB to Solr will mean that there is only (at most) 8GB
> left over for caching the index.  With 200GB of index, this is nowhere
> near enough, and is another likely source of Solr performance problems
> that cause timeouts.  This is what Upayavira was referring to in his
> reply.  For good performance with 200GB of index, you may need a lot
> more than 32GB of total RAM.
>
> https://wiki.apache.org/solr/SolrPerformanceProblems
>
> This wiki page also describes how you can use jconsole to judge how much
> heap you actually need.  24GB may be too much.
>
> Thanks,
> Shawn
>
>


Re: joins

2015-08-17 Thread Erick Erickson
True, I haven't looked at it closely. Not sure where it is in the
priority list though.

However, I would recommend you _really_ look at _why_  you
think you need cross-collection joins. Likely they will be expensive,
and whether they're performant in your situation will be a question.

If at all possible, Solr is much happier/more flexible if you can
denormalize your data and live with the fact that your index
will be bigger.

Best,
Erick

On Sun, Aug 16, 2015 at 12:59 PM, naga sharathrayapati
 wrote:
> https://issues.apache.org/jira/browse/SOLR-7090
>
> I see this jira open in support of joins which might solve the problem.
>
> On Sun, Aug 16, 2015 at 2:51 PM, Erick Erickson 
> wrote:
>
>> bq: Is there any chance of this feature(merge the results to create a
>> composite
>> document) coming out in the next release 5.3
>>
>> In a word "no". And there aren't really any long-range plans either that
>> I'm
>> aware of.
>>
>> You could also explore streaming aggregation, if the need here is more
>> batch-oriented.
>>
>> If at all possible, Solr is much more flexible if you can de-normlize your
>> data
>> rather than try to make Solr work like an RDBMS. Of course it goes against
>> the training of all DB Admins, but it's often the best option.
>>
>> So have you explored denormalizing and do you know it's not a viable
>> option?
>>
>> Best,
>> Erick
>>
>> On Sun, Aug 16, 2015 at 12:45 PM, naga sharathrayapati
>>  wrote:
>> > Is there any chance of this feature(merge the results to create a
>> composite
>> > document) coming out in the next release 5.3 ?
>> >
>> > On Sun, Aug 16, 2015 at 2:08 PM, Upayavira  wrote:
>> >
>> >> You can do what are called "pseudo joins", which are eqivalent to a
>> >> nested query in SQL. You get back data from one core, based upon
>> >> criteria in the other. You cannot (yet) merge the results to create a
>> >> composite document.
>> >>
>> >> Upayavira
>> >>
>> >> On Sun, Aug 16, 2015, at 06:02 PM, Nagasharath wrote:
>> >> > I exactly have the same requirement
>> >> >
>> >> >
>> >> >
>> >> > > On 13-Aug-2015, at 2:12 pm, Kiran Sai Veerubhotla <
>> sai.sq...@gmail.com>
>> >> wrote:
>> >> > >
>> >> > > does solr support joins?
>> >> > >
>> >> > > we have a use case where two collections have to be joined and the
>> >> join has
>> >> > > to be on the faceted results of the two collections. is this
>> possible?
>> >>
>>


Re: No. of records mismatch

2015-08-17 Thread Erick Erickson
A couple of things:
1> Be a little careful looking at deletedDocs, maxDocs and numDocs
when you're done. Deleted (or updated) docs are "merged away" as
segments merge. deletedDocs isn't an count of all docs that _have_
been deleted, it's just a count of the docs that have been
delete/updated but not yet merged away.

2>  I do not see any deletions. This isn't a count of unique IDs
replaced, but the number of explicit deletions. Having that as 0
doesn't indicate that docs have been updated.

3> bq: "Yes it is not absolutely unique but do not think it is at this
1 to 6 ratio". Check your assumption here. Assuming this is a
database, select the count of whatever field maps to your .

4> Is this a sharded situation? It shouldn't matter, you should get a
full count unless you explicitly are adding &distrib=false, just
checking.

5> If none of that is the problem, let's see your config etc.

Best,
Erick

On Sun, Aug 16, 2015 at 11:57 PM, davidphilip cherian
 wrote:
> Hi,
>
> You should check whether there were deletions by navigating to solr admin
> core admin page. Example url
> http://localhost:8983/solr/#/~cores/test_shard1_replica1, check for
> numDocs, maxDocs and deletedDocs. If numDocs remains equal to maxDocs, then
> you confirm that there were no updations (as recommended by Upayavira)
>
> HTH
>
> On Mon, Aug 17, 2015 at 4:41 AM, Pattabiraman, Meenakshisundaram <
> pattabiraman.meenakshisunda...@aig.com> wrote:
>
>> " You almost certainly have a non-unique ID field."
>> Yes it is not absolutely unique but do not think it is at this 1 to 6
>> ratio.
>>
>> "Try it with a clean index, and then review the number of deleted
>> documents (updates are a delete then insert action) "
>> I tried on a new instance - same effect. I do not see any deletions. Is
>> there a way to determine this from the logs to confirm that the behavior is
>> due to non-uniqueness? This will serve as an assurance.
>> Thanks
>>
>> 6843469
>> 6843469
>> 0
>> 2015-08-16 21:22:24
>> 
>> Indexing completed. Added/Updated: 6843469 documents. Deleted 0 documents.
>> 
>> 2015-08-16 22:31:47
>>
>> Whereas '*:*'
>> "params":{
>>   "q":"*:*"}},
>>   "response":{"numFound":1143108,"start":0,"docs":[
>>
>> -Original Message-
>> From: Upayavira [mailto:u...@odoko.co.uk]
>> Sent: Sunday, August 16, 2015 3:18 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: No. of records mismatch
>>
>> You almost certainly have a non-unique ID field. Some documents are
>> overwritten during indexing. Try it with a clean index, and then review the
>> number of deleted documents (updates are a delete then insert action).
>> Deletes are calculated with maxDocs minus numDocs.
>>
>> Upayavira
>>
>> On Sun, Aug 16, 2015, at 07:18 PM, Pattabiraman, Meenakshisundaram
>> wrote:
>> > I did a dataimport with 'clean' set to false.
>> > The DIH status upon completion was:
>> >
>> > idle
>> > 
>> > 
>> > 1 6843427 6843427 0
>> > 2015-08-16 16:50:54 
>> > Indexing completed. Added/Updated: 6843427 documents. Deleted 0
>> > documents.
>> > 
>> > Whereas when I query using 'query?q=*:*&rows=0', I get the following
>> > count {
>> >   "responseHeader":{
>> > "status":0,
>> > "QTime":1,
>> > "params":{
>> >   "q":"*:*",
>> >   "rows":"0"}},
>> >   "response":{"numFound":1616376,"start":0,"docs":[]
>> >   }}
>> >
>> > There is a difference of 5 million records. Can anyone help me
>> > understand the behavior? The logs look fine.
>> > Thanks
>>


Re: Solr start with existing collection

2015-08-17 Thread Erick Erickson
Something like this works:

bin/solr start -c -z localhost:2181 -p 8981 -s example/cloud/node1/solr

When you first start Solr with bin/solr you get a prompt asking for how many
servers you want to run and what their ports are. The script will echo
the command
you can use to start the Solr instances and it'll be something like the above.

Best,
Erick

On Mon, Aug 17, 2015 at 8:08 AM, coolmals  wrote:
> I am a new user to solr. I want to start my solr service without creating any
> core/collection and using the existing core/ collection. Can you let me know
> how?
>
> I have already created a collection and have indexed my data in that. My
> server is stopped and i want to restart the server using the existing
> collection which has indexed data. Can you let me know how?
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-start-with-existing-collection-tp4223456.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Solr start with existing collection

2015-08-17 Thread coolmals
I am a new user to solr. I want to start my solr service without creating any
core/collection and using the existing core/ collection. Can you let me know
how?

I have already created a collection and have indexed my data in that. My
server is stopped and i want to restart the server using the existing
collection which has indexed data. Can you let me know how?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-start-with-existing-collection-tp4223456.html
Sent from the Solr - User mailing list archive at Nabble.com.


SOLR cloud node has a 4k index directory

2015-08-17 Thread Jeff Courtade
Hi,

I have SOLR cloud running on SOLR 4.7.2
2 shards one replica each.

The size of the index directories is odd on ps03


ps01 shard1 leader

41G
/opt/solr/solr-4.7.2/example/solr/collection1/data/index.20150815024352580


ps03 shard 2 replica

59G
/opt/solr/solr-4.7.2/example/solr/collection1/data/index.20140906125148419
18G
/opt/solr/solr-4.7.2/example/solr/collection1/data/index.20150129181248396
24G
/opt/solr/solr-4.7.2/example/solr/collection1/data/index.20150511233039042
24G
/opt/solr/solr-4.7.2/example/solr/collection1/data/index.20150527121503268
41G
/opt/solr/solr-4.7.2/example/solr/collection1/data/index.20150806034052366
4.0K
 /opt/solr/solr-4.7.2/example/solr/collection1/data/index.20150814152030017


ps02 shard 2 leader

31G
/opt/solr/solr-4.7.2/example/solr/collection1/data/index.20150527161148429
39G
/opt/solr/solr-4.7.2/example/solr/collection1/data/index.20150815151640598


ps04 shard 2 replica

61G
/opt/solr/solr-4.7.2/example/solr/collection1/data/index.20140820212651780
39G
/opt/solr/solr-4.7.2/example/solr/collection1/data/index.20150815170546642


what can i do to remedy this?

--
Thanks,

Jeff Courtade
M: 240.507.6116


RE: Admin Login

2015-08-17 Thread Davis, Daniel (NIH/NLM) [C]
I place Apache Solr behind Apache httpd with a pure HTTP reverse proxy, since 
most of the time it will be used as an API.   I use mod_auth_cas to protect the 
general /solr URL, requiring a login that refers to our common Jasiq CAS 
server, which in turns connects to our Microsoft Active Directory.

For each core, I reverse proxy the select update handler, and any others that 
are needed under what are somewhat self-descriptive URLs.   In the sample 
 configuration below, please understand that I've hidden the 
actual allowed IP address and mask appropriately.


ProxyPass http://127.0.0.1:8983/solr/learningresources/select
ProxyPassReverse http://127.0.0.1:8983/solr/learningresources/select
Options -MultiViews
Order allow,deny
Allow from 999.999.999.999/24 127.0.0.1


I believe you can do all this within Jetty, but I and my system administrators 
know and trust Apache httpd. 

-Original Message-
From: Scott Derrick [mailto:sc...@tnstaafl.net] 
Sent: Saturday, August 15, 2015 7:16 PM
To: solr-user@lucene.apache.org
Subject: Admin Login

I'm somewhat puzzled there is no built in security.  I can't image anybody is 
running a public facing solr server with the admin page wide open?

I've searched and haven't found any solutions that work out of the box.

I've tried the solutions here to no avail. 
https://wiki.apache.org/solr/SolrSecurity

and here.  http://wiki.eclipse.org/Jetty/Tutorial/Realms

The Solr security docs say to use the application server and if I could run it 
on my tomcat server I would already be done.  But I'm told I can't do that?

What solutions are people using?

Scott

--
Leave no stone unturned.
Euripides


RE: Admin Login

2015-08-17 Thread Davis, Daniel (NIH/NLM) [C]
Yes - adding to my post, I actually have a python script that verifies that 
handleSelect="false" for each core's solrconfig.xml.

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Saturday, August 15, 2015 11:57 PM
To: solr-user@lucene.apache.org
Subject: Re: Admin Login

Scott:

You better not even let them access Solr directly.

http://server:port/solr/admin/collections?ACTION=delete&name=collection.

Try it sometime on a collection that's not important ;)

But as Walter said, that'd be similar to allowing end users unrestricted access 
to a SOL database, that Solr URL is akin to "drop database".

Or, if you've locked down the admin stuff,

http://solr:port/solr/collection/update?commit=true&stream.body=*:*

Best
Erick

On Sat, Aug 15, 2015 at 6:57 PM, Scott Derrick  wrote:
> Walter,
>
> actually that explains it perfectly!  I will move behind my apache server...
>
> thanks,
>
> Scott
>
>
> On 8/15/2015 6:15 PM, Walter Underwood wrote:
>>
>> No one runs a public-facing Solr server. Just like no one runs a 
>> public-facing MySQL server.
>>
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>>
>>
>> On Aug 15, 2015, at 4:15 PM, Scott Derrick  wrote:
>>
>>> I'm somewhat puzzled there is no built in security.  I can't image 
>>> anybody is running a public facing solr server with the admin page 
>>> wide open?
>>>
>>> I've searched and haven't found any solutions that work out of the box.
>>>
>>> I've tried the solutions here to no avail.
>>> https://wiki.apache.org/solr/SolrSecurity
>>>
>>> and here.  http://wiki.eclipse.org/Jetty/Tutorial/Realms
>>>
>>> The Solr security docs say to use the application server and if I 
>>> could run it on my tomcat server I would already be done.  But I'm 
>>> told I can't do that?
>>>
>>> What solutions are people using?
>>>
>>> Scott
>>>
>>> --
>>> Leave no stone unturned.
>>> Euripides
>>
>>
>
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>


Re: Issue while setting Solr on Slider / YARN

2015-08-17 Thread Timothy Potter
Hi Vijay,

Verify the ResourceManager URL and try passing the --manager param to
explicitly set the ResourceManager URL during the create step.

Cheers,
Tim

On Mon, Aug 17, 2015 at 4:37 AM, Vijay Bhoomireddy
 wrote:
> Hi,
>
>
>
> Any help on this please?
>
>
>
> Thanks & Regards
>
> Vijay
>
>
>
> From: Vijay Bhoomireddy [mailto:vijaya.bhoomire...@whishworks.com]
> Sent: 14 August 2015 18:03
> To: solr-user@lucene.apache.org
> Subject: Issue while setting Solr on Slider / YARN
>
>
>
> Hi,
>
>
>
> We have a requirement of setting up of Solr Cloud to work along with Hadoop.
> Earlier, I could setup a SolrCloud cluster separately alongside the Hadoop
> cluster i.e. it looks like two logical  clusters sitting next to each other,
> both relying on HDFS.
>
>
>
> However, the experiment now I am trying to do is to install SolrCloud on
> YARN using Apache Slider. I am following LucidWorks blog at
> https://github.com/LucidWorks/solr-slider for the same. I already have a
> Hortonworks HDP cluster. When I try to setup Solr on my HDP cluster using
> Slider, I am facing some issues.
>
>
>
> As per the blog, I have performed the below steps:
>
>
>
> 1.   I have setup a single node HDP cluster for which the hostname is
> myhdpcluster.com with all the essential services including ZooKeeper and
> Slider running on it.
>
> 2.   Updated the resource manager address and port in slider-client.xml
> present under /var/hdp/current/slider/conf
>
> 
>
> yarn.resourcemanager.address
>
>  myhdpcluster.com:8032
>
> 
>
> 3.   Cloned the LucidWorks git and moved it under /user/hdfs/solr-slider
>
> 4.   Downloaded solr latest stable distribution and renamed it as
> solr.tgz and placed it under /user/hdfs/solr-slider/package/files/solr.tgz
>
> 5.   Next ran the following command from within the
> /user/hdfs/solr-slider folder
>
> zip -r solr-on-yarn.zip metainfo.xml package/
>
> 6.   Next ran the following command as hdfs user
>
> slider install-package --replacepkg --name solr --package
> /user/hdfs/solr-slider/solr-on-yarn.zip
>
> 7.   Modified the following settings in the
> /user/hdfs/solr-slider/appConfig-default.json file
>
> "java_home": MY_JAVA_HOME_LOCATION
>
> "site.global.app_root": "${AGENT_WORK_ROOT}/app/install/solr-5.2.1",
> (Should this be changed to any other value?)
>
> "site.global.zk_host": " myhdpcluster.com:2181",
>
> 8.   Set yarn.component.instances to 1 in resources-default.json file
>
> 9.   Next ran the following command
>
> slider create solr --template /user/hdfs/solr-slider/appConfig-default.json
> --resources /user/hdfs/solr-slider/resources-default.json
>
>
>
> During this step, I am seeing an message INFO client.RMProxy - Connecting to
> ResourceManager at myhdpcluster.com/10.0.2.15:8032
>
>
> INFO ipc.Client - Retrying connect to server:
> myhdpcluster.com/10.0.2.15:8032. Already tried 0 time(s);
>
>
>
> This message keeps repeating for 50 times and then pauses for a couple of
> seconds and then prints the same message in a loop eternally. Not sure on
> where the problem is.
>
>
>
> Can anyone please help me out to get away from this issue and help me setup
> Solr on Slider/YARN?
>
>
>
> Thanks & Regards
>
> Vijay
>
>
> --
> The contents of this e-mail are confidential and for the exclusive use of
> the intended recipient. If you receive this e-mail in error please delete
> it from your system immediately and notify us either by e-mail or
> telephone. You should not copy, forward or otherwise disclose the content
> of the e-mail. The views expressed in this communication may not
> necessarily be the view held by WHISHWORKS.


Re: Solr Caching (documentCache) not working

2015-08-17 Thread Shawn Heisey
On 8/17/2015 7:04 AM, Maulin Rathod wrote:
> We have observed that Intermittently querying become slower when 
> documentCache become empty. The documentCache is getting flushed whenever new 
> document added to the collection.
> 
> Is there any way by which we can ensure that newly added documents are 
> visible without losing data in documentCache? We are trying to use soft 
> commit but it also flushes all documents in documentCache.



> 
>   50
> 

You are doing a soft commit within 50 milliseconds of adding a new
document.  Solr can have severe performance problems when autoSoftCommit
is set to 1000 -- one second.  50 milliseconds is one twentieth of a
very low value that is known to cause problems.  It can make the problem
much more than 20 times worse.

Please read this article:

http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Note one particular section, which says the following:  Don’t listen to
your product manager who says "we need no more than 1 second latency".

You need to set your commit interval as long as you possibly can.  I
personally wouldn't go longer than 60 seconds, 30 seconds if the commits
complete particularly fast.  It should be several minutes if that will
meet your needs.  When your commit interval is very low, Solr's caches
can become useless, as you've noticed.

TL;DR info:  Your autoCommit settings have openSearcher set to false, so
they do not matter for the problem you have described. I would probably
increase that to 5 minutes rather than 15 seconds, but that is not very
important here, and 15 seconds for hard commits that don't open a new
searcher is known to have a low impact on performance.  "Low" impact
isn't the same as NO impact, so I keep this interval long as well.

Thanks,
Shawn



Result Grouping: Number of results in group is not according to specs

2015-08-17 Thread Arnon Yogev
When using result grouping, Solr specs state the following about the "rows"
and "group.limit" params:
rows - The number of groups to return.
group.limit -  Number of rows to return in each group.

We are using Solr cloud with a single collection and 64 shards.
When grouping by field (i.e. using the group.field parameter), the behavior
is as expected.
However, when grouping by query (using group.query), the number of
documents inside each group is affected by the rows param, instead of the
group.limit param.
This is different than what is mentioned in the specs.

Moreover, when switching to a non-sharded environment (64 collections, 1
shard per collection), the behavior is different, and the number of docs
inside each group is affected by the group.query param, as expected.

Is this a known issue?
We couldn't find a relevant JIRA ticket.

Thanks,
Arnon


Re: Query time out. Solr node goes down.

2015-08-17 Thread Shawn Heisey
On 8/17/2015 5:45 AM, Modassar Ather wrote:
> The servers have 32g memory each. Solr JVM memory is set to -Xms20g
> -Xmx24g. There are no OOM in logs.

Are you starting Solr 5.2.1 with the included start script, or have you
installed it into another container?

Assuming you're using the download's "bin/solr" script, that will
normally set Xms and Xmx to the same value, so if you have overridden
the memory settings such that you can have different values in Xms and
Xmx, have you also overridden the garbage collection parameters?  If you
have, what are they set to now?  You can see all arguments used on
startup in the "JVM" section of the admin UI dashboard.

If you've installed in an entirely different container, or you have
overridden the garbage collection settings, then a 24GB heap might have
extreme garbage collection pauses, lasting long enough to exceed the
timeout.

Giving 24 out of 32GB to Solr will mean that there is only (at most) 8GB
left over for caching the index.  With 200GB of index, this is nowhere
near enough, and is another likely source of Solr performance problems
that cause timeouts.  This is what Upayavira was referring to in his
reply.  For good performance with 200GB of index, you may need a lot
more than 32GB of total RAM.

https://wiki.apache.org/solr/SolrPerformanceProblems

This wiki page also describes how you can use jconsole to judge how much
heap you actually need.  24GB may be too much.

Thanks,
Shawn



Solr Caching (documentCache) not working

2015-08-17 Thread Maulin Rathod

Hi,

We are using solr cloud 5.2 version.

We have observed that Intermittently querying become slower when documentCache 
become empty. The documentCache is getting flushed whenever new document added 
to the collection.

Is there any way by which we can ensure that newly added documents are visible 
without losing data in documentCache? We are trying to use soft commit but it 
also flushes all documents in documentCache.

We have following setting in solrconfig.xml.



  ${solr.autoCommit.maxTime:15000}
  false



  50




Regards,

Maulin

[CC Award Winners 2014]



Re: Query time out. Solr node goes down.

2015-08-17 Thread Modassar Ather
Thanks Upayavira fo your inputs. The java vesrion is 1.7.0_79.

On Mon, Aug 17, 2015 at 5:57 PM, Upayavira  wrote:

> Hoping that others will chime in here with other ideas. Have you,
> though, tried reducing the JVM memory, leaving more available for the OS
> disk cache? Having said that, I'd expect that to improve performance,
> not to cause JVM crashes.
>
> It might also help to know what version of Java you are running.
>
> Upayavira
>
> On Mon, Aug 17, 2015, at 12:45 PM, Modassar Ather wrote:
> > The servers have 32g memory each. Solr JVM memory is set to -Xms20g
> > -Xmx24g. There are no OOM in logs.
> >
> > Regards,
> > Modassar
> >
> > On Mon, Aug 17, 2015 at 5:06 PM, Upayavira  wrote:
> >
> > > How much memory does each server have? How much of that memory is
> > > assigned to the JVM? Is anything reported in the logs (e.g.
> > > OutOfMemoryError)?
> > >
> > > On Mon, Aug 17, 2015, at 12:29 PM, Modassar Ather wrote:
> > > > Hi,
> > > >
> > > > I have a Solr cluster which hosts around 200 GB of index on each
> node and
> > > > are 6 nodes. Solr version is 5.2.1.
> > > > When a huge query is fired, it times out *(The request took too long
> to
> > > > iterate over terms.)*, which I can see in the log but at same time
> the
> > > > one
> > > > of the Solr node goes down and the logs on the Solr nodes starts
> showing
> > > >
> > > >
> > > > *following exception.org.apache.solr.common.SolrException: no servers
> > > > hosting shard.*
> > > > For sometime the shards are not responsive and other queries are not
> > > > searched till the node(s) are back again. This is fine but what
> could be
> > > > the possible cause of solr node going down.
> > > > The other exception after the solr node goes down is leader election
> > > > related which is not a concern as there is no replica of the nodes.
> > > >
> > > > Please provide your suggestions.
> > > >
> > > > Thanks,
> > > > Modassar
> > >
>


Re: Query time out. Solr node goes down.

2015-08-17 Thread Upayavira
Hoping that others will chime in here with other ideas. Have you,
though, tried reducing the JVM memory, leaving more available for the OS
disk cache? Having said that, I'd expect that to improve performance,
not to cause JVM crashes.

It might also help to know what version of Java you are running.

Upayavira

On Mon, Aug 17, 2015, at 12:45 PM, Modassar Ather wrote:
> The servers have 32g memory each. Solr JVM memory is set to -Xms20g
> -Xmx24g. There are no OOM in logs.
> 
> Regards,
> Modassar
> 
> On Mon, Aug 17, 2015 at 5:06 PM, Upayavira  wrote:
> 
> > How much memory does each server have? How much of that memory is
> > assigned to the JVM? Is anything reported in the logs (e.g.
> > OutOfMemoryError)?
> >
> > On Mon, Aug 17, 2015, at 12:29 PM, Modassar Ather wrote:
> > > Hi,
> > >
> > > I have a Solr cluster which hosts around 200 GB of index on each node and
> > > are 6 nodes. Solr version is 5.2.1.
> > > When a huge query is fired, it times out *(The request took too long to
> > > iterate over terms.)*, which I can see in the log but at same time the
> > > one
> > > of the Solr node goes down and the logs on the Solr nodes starts showing
> > >
> > >
> > > *following exception.org.apache.solr.common.SolrException: no servers
> > > hosting shard.*
> > > For sometime the shards are not responsive and other queries are not
> > > searched till the node(s) are back again. This is fine but what could be
> > > the possible cause of solr node going down.
> > > The other exception after the solr node goes down is leader election
> > > related which is not a concern as there is no replica of the nodes.
> > >
> > > Please provide your suggestions.
> > >
> > > Thanks,
> > > Modassar
> >


Re: Query time out. Solr node goes down.

2015-08-17 Thread Modassar Ather
The servers have 32g memory each. Solr JVM memory is set to -Xms20g
-Xmx24g. There are no OOM in logs.

Regards,
Modassar

On Mon, Aug 17, 2015 at 5:06 PM, Upayavira  wrote:

> How much memory does each server have? How much of that memory is
> assigned to the JVM? Is anything reported in the logs (e.g.
> OutOfMemoryError)?
>
> On Mon, Aug 17, 2015, at 12:29 PM, Modassar Ather wrote:
> > Hi,
> >
> > I have a Solr cluster which hosts around 200 GB of index on each node and
> > are 6 nodes. Solr version is 5.2.1.
> > When a huge query is fired, it times out *(The request took too long to
> > iterate over terms.)*, which I can see in the log but at same time the
> > one
> > of the Solr node goes down and the logs on the Solr nodes starts showing
> >
> >
> > *following exception.org.apache.solr.common.SolrException: no servers
> > hosting shard.*
> > For sometime the shards are not responsive and other queries are not
> > searched till the node(s) are back again. This is fine but what could be
> > the possible cause of solr node going down.
> > The other exception after the solr node goes down is leader election
> > related which is not a concern as there is no replica of the nodes.
> >
> > Please provide your suggestions.
> >
> > Thanks,
> > Modassar
>


Re: Query time out. Solr node goes down.

2015-08-17 Thread Upayavira
How much memory does each server have? How much of that memory is
assigned to the JVM? Is anything reported in the logs (e.g.
OutOfMemoryError)?

On Mon, Aug 17, 2015, at 12:29 PM, Modassar Ather wrote:
> Hi,
> 
> I have a Solr cluster which hosts around 200 GB of index on each node and
> are 6 nodes. Solr version is 5.2.1.
> When a huge query is fired, it times out *(The request took too long to
> iterate over terms.)*, which I can see in the log but at same time the
> one
> of the Solr node goes down and the logs on the Solr nodes starts showing
> 
> 
> *following exception.org.apache.solr.common.SolrException: no servers
> hosting shard.*
> For sometime the shards are not responsive and other queries are not
> searched till the node(s) are back again. This is fine but what could be
> the possible cause of solr node going down.
> The other exception after the solr node goes down is leader election
> related which is not a concern as there is no replica of the nodes.
> 
> Please provide your suggestions.
> 
> Thanks,
> Modassar


Query time out. Solr node goes down.

2015-08-17 Thread Modassar Ather
Hi,

I have a Solr cluster which hosts around 200 GB of index on each node and
are 6 nodes. Solr version is 5.2.1.
When a huge query is fired, it times out *(The request took too long to
iterate over terms.)*, which I can see in the log but at same time the one
of the Solr node goes down and the logs on the Solr nodes starts showing


*following exception.org.apache.solr.common.SolrException: no servers
hosting shard.*
For sometime the shards are not responsive and other queries are not
searched till the node(s) are back again. This is fine but what could be
the possible cause of solr node going down.
The other exception after the solr node goes down is leader election
related which is not a concern as there is no replica of the nodes.

Please provide your suggestions.

Thanks,
Modassar


Global pagination on grouped result doesn't work.

2015-08-17 Thread Md. Mazharul Anwar
Hello,


My Goal: Search solr and group the result based on a field , paginate through 
the grouped result.

The query I used: 
group=true&group.field=customer_company_name&group.ngroups=true


I get a result set of:


{

  *
responseHeader:
{
 *
status: 0,
 *
QTime: 20,
 *
params:
{
*
group.ngroups: "true",
*
indent: "true",
*
q: "customer_company_name_lc:com*",
*
group.field: "customer_company_name",
*
group: "true",
*
wt: "json"
}
},
  *
grouped:
{
 *
customer_company_name:
{
*
matches: 565,
*
ngroups: 352,
*
groups:
[
*
{ ...








So I presume I can paginate through the groups by using &start=300&rows=60 ? 
Based on the total ngroups are 352, I should get back a json with 52 docs. 
However, when I do that myresult seems to be returning with 0 docs. An empty 
doc block. Like this:
{

  *
responseHeader:
{
 *
status: 0,
 *
QTime: 9,
 *
params:
{
*
group.ngroups: "true",
*
indent: "true",
*
start: "300",
*
q: "customer_company_name_lc:com*",
*
group.field: "customer_company_name",
*
group: "true",
*
wt: "json",
*
rows: "2"
}
},
  *
grouped:
{
 *
customer_company_name:
{
*
matches: 565,
*
ngroups: 352,
*
groups: [ ]
}
}

}


May I know what I am doing wrong or should I look at the problem from a 
different point of view ?

Thanks.
Maz


Issue while setting Solr on Slider / YARN

2015-08-17 Thread Vijay Bhoomireddy
Hi,

 

Any help on this please?

 

Thanks & Regards

Vijay

 

From: Vijay Bhoomireddy [mailto:vijaya.bhoomire...@whishworks.com] 
Sent: 14 August 2015 18:03
To: solr-user@lucene.apache.org
Subject: Issue while setting Solr on Slider / YARN

 

Hi,

 

We have a requirement of setting up of Solr Cloud to work along with Hadoop.
Earlier, I could setup a SolrCloud cluster separately alongside the Hadoop
cluster i.e. it looks like two logical  clusters sitting next to each other,
both relying on HDFS.

 

However, the experiment now I am trying to do is to install SolrCloud on
YARN using Apache Slider. I am following LucidWorks blog at
https://github.com/LucidWorks/solr-slider for the same. I already have a
Hortonworks HDP cluster. When I try to setup Solr on my HDP cluster using
Slider, I am facing some issues.

 

As per the blog, I have performed the below steps:

 

1.   I have setup a single node HDP cluster for which the hostname is
myhdpcluster.com with all the essential services including ZooKeeper and
Slider running on it.

2.   Updated the resource manager address and port in slider-client.xml
present under /var/hdp/current/slider/conf



yarn.resourcemanager.address

 myhdpcluster.com:8032



3.   Cloned the LucidWorks git and moved it under /user/hdfs/solr-slider

4.   Downloaded solr latest stable distribution and renamed it as
solr.tgz and placed it under /user/hdfs/solr-slider/package/files/solr.tgz

5.   Next ran the following command from within the
/user/hdfs/solr-slider folder

zip -r solr-on-yarn.zip metainfo.xml package/

6.   Next ran the following command as hdfs user

slider install-package --replacepkg --name solr --package
/user/hdfs/solr-slider/solr-on-yarn.zip

7.   Modified the following settings in the
/user/hdfs/solr-slider/appConfig-default.json file

"java_home": MY_JAVA_HOME_LOCATION

"site.global.app_root": "${AGENT_WORK_ROOT}/app/install/solr-5.2.1",
(Should this be changed to any other value?)

"site.global.zk_host": " myhdpcluster.com:2181",

8.   Set yarn.component.instances to 1 in resources-default.json file

9.   Next ran the following command

slider create solr --template /user/hdfs/solr-slider/appConfig-default.json
--resources /user/hdfs/solr-slider/resources-default.json

 

During this step, I am seeing an message INFO client.RMProxy - Connecting to
ResourceManager at myhdpcluster.com/10.0.2.15:8032 

 
INFO ipc.Client - Retrying connect to server:
myhdpcluster.com/10.0.2.15:8032. Already tried 0 time(s); 

 

This message keeps repeating for 50 times and then pauses for a couple of
seconds and then prints the same message in a loop eternally. Not sure on
where the problem is.

 

Can anyone please help me out to get away from this issue and help me setup
Solr on Slider/YARN?

 

Thanks & Regards

Vijay


-- 
The contents of this e-mail are confidential and for the exclusive use of 
the intended recipient. If you receive this e-mail in error please delete 
it from your system immediately and notify us either by e-mail or 
telephone. You should not copy, forward or otherwise disclose the content 
of the e-mail. The views expressed in this communication may not 
necessarily be the view held by WHISHWORKS.


Re: how to extend JavaBinCodec and make it available in solrj api

2015-08-17 Thread Dmitry Kan
thanks, Shalin! We have survived by passing our custom structure string in
Json. Still to be tested for performance.

On Sat, Aug 8, 2015 at 5:22 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> Or use the XsltResponseWriter :)
>
> On Sat, Aug 8, 2015 at 7:51 PM, Shalin Shekhar Mangar
>  wrote:
> > No, I'm afraid you will have to extend the XmlResponseWriter in that
> case.
> >
> > On Sat, Aug 8, 2015 at 2:02 PM, Dmitry Kan  wrote:
> >> Shalin,
> >>
> >> Thanks, can I also introduce custom entity tags like in my example with
> the
> >> highlighter output?
> >>
> >> Dmitry
> >>
> >> On Fri, Aug 7, 2015 at 5:10 PM, Shalin Shekhar Mangar <
> >> shalinman...@gmail.com> wrote:
> >>
> >>> The thing is that you are trying to introduce custom xml tags which
> >>> require changing the response writers. Instead, if you just used
> >>> nested maps/lists or SimpleOrderedMap/NamedList then every response
> >>> writer should be able to just directly write the output. Nesting is
> >>> not a problem.
> >>>
> >>> On Fri, Aug 7, 2015 at 6:09 PM, Dmitry Kan 
> wrote:
> >>> > Shawn:
> >>> >
> >>> > thanks, we found an intermediate solution by serializing our data
> >>> structure
> >>> > using string representation, perhaps less optimal than using binary
> >>> format
> >>> > directly.
> >>> >
> >>> > In the original router with JavaBinCodec we found, that
> >>> > BinaryResponseWriter should also be extended. But the following
> method is
> >>> > static and does allow extending:
> >>> >
> >>> > public static NamedList getParsedResponse(SolrQueryRequest
> >>> > req, SolrQueryResponse rsp) {
> >>> >   try {
> >>> > Resolver resolver = new Resolver(req, rsp.getReturnFields());
> >>> >
> >>> > ByteArrayOutputStream out = new ByteArrayOutputStream();
> >>> > new JavaBinCodec(resolver).marshal(rsp.getValues(), out);
> >>> >
> >>> > InputStream in = new ByteArrayInputStream(out.toByteArray());
> >>> > return (NamedList) new
> JavaBinCodec(resolver).unmarshal(in);
> >>> >   }
> >>> >   catch (Exception ex) {
> >>> > throw new RuntimeException(ex);
> >>> >   }
> >>> > }
> >>> >
> >>> >
> >>> >
> >>> > Shalin:
> >>> >
> >>> > We needed new data structure in highlighter with more nested levels,
> >>> > than just one. Something like this (in xml representation):
> >>> >
> >>> > 
> >>> >   
> >>> > 
> >>> >   
> >>> >
> >>> >  id1
> >>> >
> >>> >  Snippet text goes here
> >>> >
> >>> >  
> >>> >
> >>> >   
> >>> >
> >>> > 
> >>> >
> >>> >
> >>> >   
> >>> >
> >>> > Can this be modelled with existing types?
> >>> >
> >>> >
> >>> > On Thu, Aug 6, 2015 at 9:47 PM, Shalin Shekhar Mangar <
> >>> > shalinman...@gmail.com> wrote:
> >>> >
> >>> >> What do you mean by a custom format? As long as your custom
> component
> >>> >> is writing primitives or NamedList/SimpleOrderedMap or collections
> >>> >> such as List/Map, any response writer should be able to handle them.
> >>> >>
> >>> >> On Wed, Aug 5, 2015 at 5:08 PM, Dmitry Kan 
> >>> wrote:
> >>> >> > Hello,
> >>> >> >
> >>> >> > Solr: 5.2.1
> >>> >> > class: org.apache.solr.common.util.JavaBinCodec
> >>> >> >
> >>> >> > I'm working on a custom data structure for the highlighter. The
> data
> >>> >> > structure is ready in JSON and XML formats. I need also JavaBin
> >>> format.
> >>> >> The
> >>> >> > data structure is already made serializable by extending the
> >>> >> WritableValue
> >>> >> > class (methods write and resolve).
> >>> >> >
> >>> >> > To receive the custom format on the client via solrj api, the data
> >>> >> > structure needs to be parseable by JavaBinCodec. Is this correct
> >>> >> > assumption? Can we introduce the custom data structure consumer
> on the
> >>> >> > solrj api without complete overhaul of the api? Is there plugin
> >>> framework
> >>> >> > such that JavaBinCodec is extended and used for the new data
> >>> structure?
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >> > --
> >>> >> > Dmitry Kan
> >>> >> > Luke Toolbox: http://github.com/DmitryKey/luke
> >>> >> > Blog: http://dmitrykan.blogspot.com
> >>> >> > Twitter: http://twitter.com/dmitrykan
> >>> >> > SemanticAnalyzer: www.semanticanalyzer.info
> >>> >>
> >>> >>
> >>> >>
> >>> >> --
> >>> >> Regards,
> >>> >> Shalin Shekhar Mangar.
> >>> >>
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> > Dmitry Kan
> >>> > Luke Toolbox: http://github.com/DmitryKey/luke
> >>> > Blog: http://dmitrykan.blogspot.com
> >>> > Twitter: http://twitter.com/dmitrykan
> >>> > SemanticAnalyzer: www.semanticanalyzer.info
> >>>
> >>>
> >>>
> >>> --
> >>> Regards,
> >>> Shalin Shekhar Mangar.
> >>>
> >>
> >>
> >>
> >> --
> >> Dmitry Kan
> >> Luke Toolbox: http://github.com/DmitryKey/luke
> >> Blog: http://dmitrykan.blogspot.com
> >> Twitter: http://twitter.com/dmitrykan
> >> SemanticAnalyzer: www.semanticanalyzer.info
> >
> >
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
Dmitry Kan
Luke Toolbox: http://github.com/Dmitry

Logging in solr admin page

2015-08-17 Thread davidphilip cherian
Hi,

Where are the logs fetched from on solr admin ui page?
http://localhost:8983/solr/#/~logging. I am unable to see any logs there.
Its just showing the 'loading' symbol but no logs fetched. What could be
the reason? Any logging setting that has to be made?

Thanks.