Re: Compression vs FieldCache for doc ids retrieval

2014-05-31 Thread William Bell
Why not just submit a JIRA issue - and add your patch so that we can all
benefit?


On Fri, May 30, 2014 at 5:34 AM, Manuel Le Normand <
manuel.lenorm...@gmail.com> wrote:

> Is the issue SOLR-5478 what you were looking for?
>



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: Uneven shard heap usage

2014-05-31 Thread Otis Gospodnetic
Hi Joe,

Are you/how are you sure all 3 shards are roughly the same size?  Can you
share what you run/see that shows you that?

Are you sure queries are evenly distributed?  Something like SPM
 should give you insight into that.

How big are your caches?

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Sat, May 31, 2014 at 5:54 PM, Joe Gresock  wrote:

> Interesting thought about the routing.  Our document ids are in 3 parts:
>
> <10-digit identifier>!!
>
> e.g., 5/12345678!13025603!TEXT
>
> Each object has an identifier, and there may be multiple versions of the
> object, hence the timestamp.  We like to be able to pull back all of the
> versions of an object at once, hence the routing scheme.
>
> The nature of the identifier is that a great many of them begin with a
> certain number.  I'd be interested to know more about the hashing scheme
> used for the document routing.  Perhaps the first character gives it more
> weight as to which shard it lands in?
>
> It seems strange that certain of the most highly-searched documents would
> happen to fall on this shard, but you may be onto something.   We'll scrape
> through some non-distributed queries and see what we can find.
>
>
> On Sat, May 31, 2014 at 1:47 PM, Erick Erickson 
> wrote:
>
> > This is very weird.
> >
> > Are you sure that all the Java versions are identical? And all the JVM
> > parameters are the same? Grasping at straws here.
> >
> > More grasping at straws: I'm a little suspicious that you are using
> > routing. You say that the indexes are about the same size, but is it is
> > possible that your routing is somehow loading the problem shard
> abnormally?
> > By that I mean somehow the documents on that shard are different, or
> have a
> > drastically higher number of hits than the other shards?
> >
> > You can fire queries at shards with &distrib=false and NOT have it go to
> > other shards, perhaps if you can isolate the problem queries that might
> > shed some light on the problem.
> >
> >
> > Best
> > er...@baffled.com
> >
> >
> > On Sat, May 31, 2014 at 8:33 AM, Joe Gresock  wrote:
> >
> > > It has taken as little as 2 minutes to happen the last time we tried.
>  It
> > > basically happens upon high query load (peak user hours during the
> day).
> > >  When we reduce functionality by disabling most searches, it
> stabilizes.
> > >  So it really is only on high query load.  Our ingest rate is fairly
> low.
> > >
> > > It happens no matter how many nodes in the shard are up.
> > >
> > >
> > > Joe
> > >
> > >
> > > On Sat, May 31, 2014 at 11:04 AM, Jack Krupansky <
> > j...@basetechnology.com>
> > > wrote:
> > >
> > > > When you restart, how long does it take it hit the problem? And how
> > much
> > > > query or update activity is happening in that time? Is there any
> other
> > > > activity showing up in the log?
> > > >
> > > > If you bring up only a single node in that problematic shard, do you
> > > still
> > > > see the problem?
> > > >
> > > > -- Jack Krupansky
> > > >
> > > > -Original Message- From: Joe Gresock
> > > > Sent: Saturday, May 31, 2014 9:34 AM
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Uneven shard heap usage
> > > >
> > > >
> > > > Hi folks,
> > > >
> > > > I'm trying to figure out why one shard of an evenly-distributed
> 3-shard
> > > > cluster would suddenly start running out of heap space, after 9+
> months
> > > of
> > > > stable performance.  We're using the "!" delimiter in our ids to
> > > distribute
> > > > the documents, and indeed the disk size of our shards are very
> similar
> > > > (31-32GB on disk per replica).
> > > >
> > > > Our setup is:
> > > > 9 VMs with 16GB RAM, 8 vcpus (with a 4:1 oversubscription ratio, so
> > > > basically 2 physical CPUs), 24GB disk
> > > > 3 shards, 3 replicas per shard (1 leader, 2 replicas, whatever).  We
> > > > reserve 10g heap for each solr instance.
> > > > Also 3 zookeeper VMs, which are very stable
> > > >
> > > > Since the troubles started, we've been monitoring all 9 with
> jvisualvm,
> > > and
> > > > shards 2 and 3 keep a steady amount of heap space reserved, always
> > having
> > > > horizontal lines (with some minor gc).  They're using 4-5GB heap, and
> > > when
> > > > we force gc using jvisualvm, they drop to 1GB usage.  Shard 1,
> however,
> > > > quickly has a steep slope, and eventually has concurrent mode
> failures
> > in
> > > > the gc logs, requiring us to restart the instances when they can no
> > > longer
> > > > do anything but gc.
> > > >
> > > > We've tried ruling out physical host problems by moving all 3 Shard 1
> > > > replicas to different hosts that are underutilized, however we still
> > get
> > > > the same problem.  We'll still be working on ruling out
> infrastructure
> > > > issues, but I wanted to ask the questions here in case it makes
> sense:
> > > >
> > > > * Does it make sense that all the replicas on one sha

Re: Uneven shard heap usage

2014-05-31 Thread Michael Sokolov
Is it possible that all your requests are routed to that single shard?  
I.e. you are not using the smart client that round-robins requests?  I 
think that could cause all of the merging of results to be done on a 
single node.


Also - is it possible you have a "bad" document in that shard? Like one 
that has a GB stored field or something?


-Mike

On 5/31/2014 5:54 PM, Joe Gresock wrote:

Interesting thought about the routing.  Our document ids are in 3 parts:

<10-digit identifier>!!

e.g., 5/12345678!13025603!TEXT

Each object has an identifier, and there may be multiple versions of the
object, hence the timestamp.  We like to be able to pull back all of the
versions of an object at once, hence the routing scheme.

The nature of the identifier is that a great many of them begin with a
certain number.  I'd be interested to know more about the hashing scheme
used for the document routing.  Perhaps the first character gives it more
weight as to which shard it lands in?

It seems strange that certain of the most highly-searched documents would
happen to fall on this shard, but you may be onto something.   We'll scrape
through some non-distributed queries and see what we can find.


On Sat, May 31, 2014 at 1:47 PM, Erick Erickson 
wrote:


This is very weird.

Are you sure that all the Java versions are identical? And all the JVM
parameters are the same? Grasping at straws here.

More grasping at straws: I'm a little suspicious that you are using
routing. You say that the indexes are about the same size, but is it is
possible that your routing is somehow loading the problem shard abnormally?
By that I mean somehow the documents on that shard are different, or have a
drastically higher number of hits than the other shards?

You can fire queries at shards with &distrib=false and NOT have it go to
other shards, perhaps if you can isolate the problem queries that might
shed some light on the problem.


Best
er...@baffled.com


On Sat, May 31, 2014 at 8:33 AM, Joe Gresock  wrote:


It has taken as little as 2 minutes to happen the last time we tried.  It
basically happens upon high query load (peak user hours during the day).
  When we reduce functionality by disabling most searches, it stabilizes.
  So it really is only on high query load.  Our ingest rate is fairly low.

It happens no matter how many nodes in the shard are up.


Joe


On Sat, May 31, 2014 at 11:04 AM, Jack Krupansky <

j...@basetechnology.com>

wrote:


When you restart, how long does it take it hit the problem? And how

much

query or update activity is happening in that time? Is there any other
activity showing up in the log?

If you bring up only a single node in that problematic shard, do you

still

see the problem?

-- Jack Krupansky

-Original Message- From: Joe Gresock
Sent: Saturday, May 31, 2014 9:34 AM
To: solr-user@lucene.apache.org
Subject: Uneven shard heap usage


Hi folks,

I'm trying to figure out why one shard of an evenly-distributed 3-shard
cluster would suddenly start running out of heap space, after 9+ months

of

stable performance.  We're using the "!" delimiter in our ids to

distribute

the documents, and indeed the disk size of our shards are very similar
(31-32GB on disk per replica).

Our setup is:
9 VMs with 16GB RAM, 8 vcpus (with a 4:1 oversubscription ratio, so
basically 2 physical CPUs), 24GB disk
3 shards, 3 replicas per shard (1 leader, 2 replicas, whatever).  We
reserve 10g heap for each solr instance.
Also 3 zookeeper VMs, which are very stable

Since the troubles started, we've been monitoring all 9 with jvisualvm,

and

shards 2 and 3 keep a steady amount of heap space reserved, always

having

horizontal lines (with some minor gc).  They're using 4-5GB heap, and

when

we force gc using jvisualvm, they drop to 1GB usage.  Shard 1, however,
quickly has a steep slope, and eventually has concurrent mode failures

in

the gc logs, requiring us to restart the instances when they can no

longer

do anything but gc.

We've tried ruling out physical host problems by moving all 3 Shard 1
replicas to different hosts that are underutilized, however we still

get

the same problem.  We'll still be working on ruling out infrastructure
issues, but I wanted to ask the questions here in case it makes sense:

* Does it make sense that all the replicas on one shard of a cluster

would

have heap problems, when the other shard replicas do not, assuming a

fairly

even data distribution?
* One thing we changed recently was to make all of our fields stored,
instead of only half of them.  This was to support atomic updates.  Can
stored fields, even though lazily loaded, cause problems like this?

Thanks for any input,
Joe





--
I know what it is to be in need, and I know what it is to have plenty.

  I

have learned the secret of being content in any and every situation,
whether well fed or hungry, whether living in plenty or in want.  I can

do

all this through him who gives me strength.*-Philippians 4:1

Re: Uneven shard heap usage

2014-05-31 Thread Joe Gresock
Interesting thought about the routing.  Our document ids are in 3 parts:

<10-digit identifier>!!

e.g., 5/12345678!13025603!TEXT

Each object has an identifier, and there may be multiple versions of the
object, hence the timestamp.  We like to be able to pull back all of the
versions of an object at once, hence the routing scheme.

The nature of the identifier is that a great many of them begin with a
certain number.  I'd be interested to know more about the hashing scheme
used for the document routing.  Perhaps the first character gives it more
weight as to which shard it lands in?

It seems strange that certain of the most highly-searched documents would
happen to fall on this shard, but you may be onto something.   We'll scrape
through some non-distributed queries and see what we can find.


On Sat, May 31, 2014 at 1:47 PM, Erick Erickson 
wrote:

> This is very weird.
>
> Are you sure that all the Java versions are identical? And all the JVM
> parameters are the same? Grasping at straws here.
>
> More grasping at straws: I'm a little suspicious that you are using
> routing. You say that the indexes are about the same size, but is it is
> possible that your routing is somehow loading the problem shard abnormally?
> By that I mean somehow the documents on that shard are different, or have a
> drastically higher number of hits than the other shards?
>
> You can fire queries at shards with &distrib=false and NOT have it go to
> other shards, perhaps if you can isolate the problem queries that might
> shed some light on the problem.
>
>
> Best
> er...@baffled.com
>
>
> On Sat, May 31, 2014 at 8:33 AM, Joe Gresock  wrote:
>
> > It has taken as little as 2 minutes to happen the last time we tried.  It
> > basically happens upon high query load (peak user hours during the day).
> >  When we reduce functionality by disabling most searches, it stabilizes.
> >  So it really is only on high query load.  Our ingest rate is fairly low.
> >
> > It happens no matter how many nodes in the shard are up.
> >
> >
> > Joe
> >
> >
> > On Sat, May 31, 2014 at 11:04 AM, Jack Krupansky <
> j...@basetechnology.com>
> > wrote:
> >
> > > When you restart, how long does it take it hit the problem? And how
> much
> > > query or update activity is happening in that time? Is there any other
> > > activity showing up in the log?
> > >
> > > If you bring up only a single node in that problematic shard, do you
> > still
> > > see the problem?
> > >
> > > -- Jack Krupansky
> > >
> > > -Original Message- From: Joe Gresock
> > > Sent: Saturday, May 31, 2014 9:34 AM
> > > To: solr-user@lucene.apache.org
> > > Subject: Uneven shard heap usage
> > >
> > >
> > > Hi folks,
> > >
> > > I'm trying to figure out why one shard of an evenly-distributed 3-shard
> > > cluster would suddenly start running out of heap space, after 9+ months
> > of
> > > stable performance.  We're using the "!" delimiter in our ids to
> > distribute
> > > the documents, and indeed the disk size of our shards are very similar
> > > (31-32GB on disk per replica).
> > >
> > > Our setup is:
> > > 9 VMs with 16GB RAM, 8 vcpus (with a 4:1 oversubscription ratio, so
> > > basically 2 physical CPUs), 24GB disk
> > > 3 shards, 3 replicas per shard (1 leader, 2 replicas, whatever).  We
> > > reserve 10g heap for each solr instance.
> > > Also 3 zookeeper VMs, which are very stable
> > >
> > > Since the troubles started, we've been monitoring all 9 with jvisualvm,
> > and
> > > shards 2 and 3 keep a steady amount of heap space reserved, always
> having
> > > horizontal lines (with some minor gc).  They're using 4-5GB heap, and
> > when
> > > we force gc using jvisualvm, they drop to 1GB usage.  Shard 1, however,
> > > quickly has a steep slope, and eventually has concurrent mode failures
> in
> > > the gc logs, requiring us to restart the instances when they can no
> > longer
> > > do anything but gc.
> > >
> > > We've tried ruling out physical host problems by moving all 3 Shard 1
> > > replicas to different hosts that are underutilized, however we still
> get
> > > the same problem.  We'll still be working on ruling out infrastructure
> > > issues, but I wanted to ask the questions here in case it makes sense:
> > >
> > > * Does it make sense that all the replicas on one shard of a cluster
> > would
> > > have heap problems, when the other shard replicas do not, assuming a
> > fairly
> > > even data distribution?
> > > * One thing we changed recently was to make all of our fields stored,
> > > instead of only half of them.  This was to support atomic updates.  Can
> > > stored fields, even though lazily loaded, cause problems like this?
> > >
> > > Thanks for any input,
> > > Joe
> > >
> > >
> > >
> > >
> > >
> > > --
> > > I know what it is to be in need, and I know what it is to have plenty.
>  I
> > > have learned the secret of being content in any and every situation,
> > > whether well fed or hungry, whether living in plenty or in want.  I can
> > do
> > > all t

Re: Full Indexing fails on Solr-Probable connection issue.HELP!

2014-05-31 Thread Aniket Bhoi
still awaiting a response from someone


On Tue, May 27, 2014 at 1:35 PM, Aniket Bhoi  wrote:

>
>
>
> On Mon, May 26, 2014 at 4:14 PM, Aniket Bhoi 
> wrote:
>
>> Another thing I have noted is that the exception always follows a commit
>> operation.Log excerpt below:
>>
>> INFO: SolrDeletionPolicy.onCommit: commits:num=2
>> commit{dir=/opt/solr/cores/calls/data/index,segFN=segments_2qt,version=1347458723267,generation=3557,filenames=[_3z9.tii,
>> _3z3.fnm, _3z9.nrm, _3za.prx, _3z9.fdt, _3z9.fnm, _3z9.fdx, _3z3.frq,
>> _3za.nrm, segments_2qt, _3z3.fdx, _3z9.prx, _3z3.fdt, _3za.fdx, _3z9.frq,
>> _3z3.prx, _3za.fdt, _3z3.tii, _3za.tis, _3za.fnm, _3z3.nrm, _3z9.tis,
>> _3za.tii, _3za.frq, _3z3.tis]
>>  
>> commit{dir=/opt/solr/cores/calls/data/index,segFN=segments_2qu,version=1347458723269,generation=3558,filenames=[_3zb.fdt,
>> _3z9.tii, _3z3.fnm, _3z9.nrm, _3zb.tii, _3zb.tis, _3zb.fdx, _3za.prx,
>> _3z9.fdt, _3z9.fnm, _3z9.fdx, _3zb.frq, _3z3.frq, _3za.nrm, segments_2qu,
>> _3z3.fdx, _3zb.prx, _3z9.prx, _3zb.fnm, _3z3.fdt, _3za.fdx, _3z9.frq,
>> _3z3.prx, _3za.fdt, _3zb.nrm, _3z3.tii, _3za.tis, _3za.fnm, _3z3.nrm,
>> _3z9.tis, _3za.tii, _3za.frq, _3z3.tis]
>> May 24, 2014 5:49:05 AM org.apache.solr.core.SolrDeletionPolicy
>> updateCommits
>> INFO: newest commit = 1347458723269
>> May 24, 2014 5:49:05 AM org.apache.solr.search.SolrIndexSearcher 
>> INFO: Opening Searcher@423dbcca main
>> May 24, 2014 5:49:05 AM org.apache.solr.search.SolrIndexSearcher warm
>> INFO: autowarming Searcher@423dbcca main from Searcher@19c19869 main
>>
>> fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
>> May 24, 2014 5:49:05 AM org.apache.solr.search.SolrIndexSearcher warm
>> INFO: autowarming result for Searcher@423dbcca main
>>
>> fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
>> May 24, 2014 5:49:05 AM org.apache.solr.search.SolrIndexSearcher warm
>> INFO: autowarming Searcher@423dbcca main from Searcher@19c19869 main
>>
>> filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
>> May 24, 2014 5:49:05 AM org.apache.solr.search.SolrIndexSearcher warm
>> INFO: autowarming result for Searcher@423dbcca main
>>
>> filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
>> May 24, 2014 5:49:05 AM org.apache.solr.search.SolrIndexSearcher warm
>> INFO: autowarming Searcher@423dbcca main from Searcher@19c19869 main
>>
>> queryResultCache{lookups=1,hits=1,hitratio=1.00,inserts=3,evictions=0,size=3,warmupTime=2,cumulative_lookups=47,cumulative_hits=46,cumulative_hitratio=0.97,cumulative_inserts=1,cumulative_evictions=0}
>> May 24, 2014 5:49:05 AM org.apache.solr.search.SolrIndexSearcher warm
>> INFO: autowarming result for Searcher@423dbcca main
>>
>> queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=3,evictions=0,size=3,warmupTime=2,cumulative_lookups=47,cumulative_hits=46,cumulative_hitratio=0.97,cumulative_inserts=1,cumulative_evictions=0}
>> May 24, 2014 5:49:05 AM org.apache.solr.search.SolrIndexSearcher warm
>> INFO: autowarming Searcher@423dbcca main from Searcher@19c19869 main
>>
>> documentCache{lookups=0,hits=0,hitratio=0.00,inserts=40,evictions=0,size=40,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
>> May 24, 2014 5:49:05 AM org.apache.solr.search.SolrIndexSearcher warm
>> INFO: autowarming result for Searcher@423dbcca main
>>
>> documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
>> May 24, 2014 5:49:05 AM org.apache.solr.core.QuerySenderListener
>> newSearcher
>> INFO: QuerySenderListener sending requests to Searcher@423dbcca main
>> May 24, 2014 5:49:05 AM org.apache.solr.core.SolrCore execute
>> INFO: [calls] webapp=null path=null
>> params={start=0&event=newSearcher&q=*:*&rows=20} hits=40028 status=0
>> QTime=2
>> May 24, 2014 5:49:05 AM org.apache.solr.update.DirectUpdateHandler2 commit
>> INFO: end_commit_flush
>> May 24, 2014 5:49:05 AM org.apache.solr.core.SolrCore execute
>> INFO: [calls] webapp=null path=null
>> params={start=0&event=newSearcher&q=banking&rows=20} hits=636 status=0
>> QTime=3
>> May 24, 2014 5:49:05 AM org.apache.solr.core.QuerySenderListener
>> newSearcher
>> INFO: QuerySenderListener done.
>> May 24, 2014 5:49:05 AM
>> org.apache.solr.handler.component.SpellCheckComponent$SpellCheckerListener
>> newSearcher
>

Re: DIH Issue

2014-05-31 Thread PeriS
Seems like the issue got fixed after I updated my import query to the following;

  




On May 31, 2014, at 9:24 AM, PeriS  wrote:

> db-import.xml:
> 
> query="SELECT p.person_id, p.details, p.authz_id, 
> p.last_modified, a.address_info from person p, address a where p.person_id = 
> a.person_id"
>deltaImportQuery="SELECT p.person_id, p.authz_id, p.details, 
> p.last_modified, a.address_info from person p, address a where p.person_id = 
> a.person_id and p.person_id='${dataimporter.delta.person_id}'"
>deltaQuery="SELECT person_id from person where last_modified 
> > '${dataimporter.last_index_time}'">
> 
> 
> schema.xml:
>  
>  
>  person_id
> 
> Thanks
> Peri
> 
> 
> On May 31, 2014, at 8:05 AM, Ahmet Arslan  wrote:
> 
>> Hi,
>> 
>> Did you restart solr? Can you paste relevant portions of data-config.xml and 
>> schema.xml?
>> 
>> 
>> 
>> On Saturday, May 31, 2014 3:46 AM, PeriS  wrote:
>> I added the primaryKey as the  and still same result. 
>> 
>> On May 30, 2014, at 8:38 PM, Ahmet Arslan  wrote:
>> 
>>> Hi,
>>> 
>>> Sure, have a look at  definition in example schema.xml 
>>> 
>>> http://wiki.apache.org/solr/UniqueKey
>>> 
>>> 
>>> 
>>> 
>>> On Saturday, May 31, 2014 3:35 AM, PeriS  
>>> wrote:
>>> No. Is there a way to have the primary key of my entity be the unique key?
>>> 
>>> 
>>> On May 30, 2014, at 7:00 PM, Ahmet Arslan  wrote:
>>> 
 Hi,
 
 Do you have uniqueKey defined in schema.xml ?
 
 
 
 On Saturday, May 31, 2014 1:23 AM, PeriS  
 wrote:
 Hi,
 
 I have followed the documentation to set up my delta query, but when I 
 call the delta-import, the index is happening again for the same  record 
 and ends up being indexed twice. Any clues please?
 
 Thanks
 -Peri
 
 
 
 
 
 *** DISCLAIMER *** This is a PRIVATE message. If you are not the intended 
 recipient, please delete without copying and kindly advise us by e-mail of 
 the mistake in delivery.
 NOTE: Regardless of content, this e-mail shall not operate to bind HTC 
 Global Services to any order or other contract unless pursuant to explicit 
 written agreement or government initiative expressly permitting the use of 
 e-mail for such purpose.
 
 
>>> 
 Thank you,
 Peri Subrahmanya
 HTC Global Services 
 (KOLE)
 Cell: (+1) 618.407.3521
 Skype/Gtalk: peri.subrahmanya
 
 *** DISCLAIMER *** This is a PRIVATE message. If you are not the intended 
 recipient, please delete without copying and kindly advise us by e-mail of 
 the mistake in delivery.
 NOTE: Regardless of content, this e-mail shall not operate to bind HTC 
 Global Services to any order or other contract unless pursuant to explicit 
 written agreement or government initiative expressly permitting the use of 
 e-mail for such purpose.
>> 
>> 
>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> *** DISCLAIMER *** This is a PRIVATE message. If you are not the intended 
>>> recipient, please delete without copying and kindly advise us by e-mail of 
>>> the mistake in delivery.
>>> NOTE: Regardless of content, this e-mail shall not operate to bind HTC 
>>> Global Services to any order or other contract unless pursuant to explicit 
>>> written agreement or government initiative expressly permitting the use of 
>>> e-mail for such purpose.
>>> 
>>> 
>> 
>>> Thank you,
>>> Peri Subrahmanya
>>> HTC Global Services 
>>> (KOLE)
>>> Cell: (+1) 618.407.3521
>>> Skype/Gtalk: peri.subrahmanya
>>> 
>>> *** DISCLAIMER *** This is a PRIVATE message. If you are not the intended 
>>> recipient, please delete without copying and kindly advise us by e-mail of 
>>> the mistake in delivery.
>>> NOTE: Regardless of content, this e-mail shall not operate to bind HTC 
>>> Global Services to any order or other contract unless pursuant to explicit 
>>> written agreement or government initiative expressly permitting the use of 
>>> e-mail for such purpose.
>> 
>> 
>> 
>> *** DISCLAIMER *** This is a PRIVATE message. If you are not the intended 
>> recipient, please delete without copying and kindly advise us by e-mail of 
>> the mistake in delivery.
>> NOTE: Regardless of content, this e-mail shall not operate to bind HTC 
>> Global Services to any order or other contract unless pursuant to explicit 
>> written agreement or government initiative expressly permitting the use of 
>> e-mail for such purpose.
>> 
>> 
> 
> 
> 
> 
> *** DISCLAIMER *** This is a PRIVATE message. If you are not the intended 
> recipient, please delete without copying and kindly advise us by e-mail of 
> the mistake in delivery.
> NOTE: Regardless of content, this e-mail shall not operate to bind HTC Global 
> Services to any order or other contract unless pursuant to explicit written 
> agreement or government initiative expressly permitting the use of e-mail for 
> such purpose.
> 
> 
> *** DISCLAIMER *** This is a PRIVAT

Re: Uneven shard heap usage

2014-05-31 Thread Erick Erickson
This is very weird.

Are you sure that all the Java versions are identical? And all the JVM
parameters are the same? Grasping at straws here.

More grasping at straws: I'm a little suspicious that you are using
routing. You say that the indexes are about the same size, but is it is
possible that your routing is somehow loading the problem shard abnormally?
By that I mean somehow the documents on that shard are different, or have a
drastically higher number of hits than the other shards?

You can fire queries at shards with &distrib=false and NOT have it go to
other shards, perhaps if you can isolate the problem queries that might
shed some light on the problem.


Best
er...@baffled.com


On Sat, May 31, 2014 at 8:33 AM, Joe Gresock  wrote:

> It has taken as little as 2 minutes to happen the last time we tried.  It
> basically happens upon high query load (peak user hours during the day).
>  When we reduce functionality by disabling most searches, it stabilizes.
>  So it really is only on high query load.  Our ingest rate is fairly low.
>
> It happens no matter how many nodes in the shard are up.
>
>
> Joe
>
>
> On Sat, May 31, 2014 at 11:04 AM, Jack Krupansky 
> wrote:
>
> > When you restart, how long does it take it hit the problem? And how much
> > query or update activity is happening in that time? Is there any other
> > activity showing up in the log?
> >
> > If you bring up only a single node in that problematic shard, do you
> still
> > see the problem?
> >
> > -- Jack Krupansky
> >
> > -Original Message- From: Joe Gresock
> > Sent: Saturday, May 31, 2014 9:34 AM
> > To: solr-user@lucene.apache.org
> > Subject: Uneven shard heap usage
> >
> >
> > Hi folks,
> >
> > I'm trying to figure out why one shard of an evenly-distributed 3-shard
> > cluster would suddenly start running out of heap space, after 9+ months
> of
> > stable performance.  We're using the "!" delimiter in our ids to
> distribute
> > the documents, and indeed the disk size of our shards are very similar
> > (31-32GB on disk per replica).
> >
> > Our setup is:
> > 9 VMs with 16GB RAM, 8 vcpus (with a 4:1 oversubscription ratio, so
> > basically 2 physical CPUs), 24GB disk
> > 3 shards, 3 replicas per shard (1 leader, 2 replicas, whatever).  We
> > reserve 10g heap for each solr instance.
> > Also 3 zookeeper VMs, which are very stable
> >
> > Since the troubles started, we've been monitoring all 9 with jvisualvm,
> and
> > shards 2 and 3 keep a steady amount of heap space reserved, always having
> > horizontal lines (with some minor gc).  They're using 4-5GB heap, and
> when
> > we force gc using jvisualvm, they drop to 1GB usage.  Shard 1, however,
> > quickly has a steep slope, and eventually has concurrent mode failures in
> > the gc logs, requiring us to restart the instances when they can no
> longer
> > do anything but gc.
> >
> > We've tried ruling out physical host problems by moving all 3 Shard 1
> > replicas to different hosts that are underutilized, however we still get
> > the same problem.  We'll still be working on ruling out infrastructure
> > issues, but I wanted to ask the questions here in case it makes sense:
> >
> > * Does it make sense that all the replicas on one shard of a cluster
> would
> > have heap problems, when the other shard replicas do not, assuming a
> fairly
> > even data distribution?
> > * One thing we changed recently was to make all of our fields stored,
> > instead of only half of them.  This was to support atomic updates.  Can
> > stored fields, even though lazily loaded, cause problems like this?
> >
> > Thanks for any input,
> > Joe
> >
> >
> >
> >
> >
> > --
> > I know what it is to be in need, and I know what it is to have plenty.  I
> > have learned the secret of being content in any and every situation,
> > whether well fed or hungry, whether living in plenty or in want.  I can
> do
> > all this through him who gives me strength.*-Philippians 4:12-13*
> >
>
>
>
> --
> I know what it is to be in need, and I know what it is to have plenty.  I
> have learned the secret of being content in any and every situation,
> whether well fed or hungry, whether living in plenty or in want.  I can do
> all this through him who gives me strength.*-Philippians 4:12-13*
>


Re: search multiple cores

2014-05-31 Thread sunayansaikia
Hi Alvaro Cabrerizo,

Regarding the following --
"- B is a constraint over the documents in the coreB"

I tried and it seems if I try with the fields available only in coreB but
not in coreA, it throws an error saying, 'undefined field ''. The
field '' in coreB is indexed enabled.

Any inputs will be of great help.

Thanks




--
View this message in context: 
http://lucene.472066.n3.nabble.com/search-multiple-cores-tp4136059p4139063.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Uneven shard heap usage

2014-05-31 Thread Joe Gresock
It has taken as little as 2 minutes to happen the last time we tried.  It
basically happens upon high query load (peak user hours during the day).
 When we reduce functionality by disabling most searches, it stabilizes.
 So it really is only on high query load.  Our ingest rate is fairly low.

It happens no matter how many nodes in the shard are up.


Joe


On Sat, May 31, 2014 at 11:04 AM, Jack Krupansky 
wrote:

> When you restart, how long does it take it hit the problem? And how much
> query or update activity is happening in that time? Is there any other
> activity showing up in the log?
>
> If you bring up only a single node in that problematic shard, do you still
> see the problem?
>
> -- Jack Krupansky
>
> -Original Message- From: Joe Gresock
> Sent: Saturday, May 31, 2014 9:34 AM
> To: solr-user@lucene.apache.org
> Subject: Uneven shard heap usage
>
>
> Hi folks,
>
> I'm trying to figure out why one shard of an evenly-distributed 3-shard
> cluster would suddenly start running out of heap space, after 9+ months of
> stable performance.  We're using the "!" delimiter in our ids to distribute
> the documents, and indeed the disk size of our shards are very similar
> (31-32GB on disk per replica).
>
> Our setup is:
> 9 VMs with 16GB RAM, 8 vcpus (with a 4:1 oversubscription ratio, so
> basically 2 physical CPUs), 24GB disk
> 3 shards, 3 replicas per shard (1 leader, 2 replicas, whatever).  We
> reserve 10g heap for each solr instance.
> Also 3 zookeeper VMs, which are very stable
>
> Since the troubles started, we've been monitoring all 9 with jvisualvm, and
> shards 2 and 3 keep a steady amount of heap space reserved, always having
> horizontal lines (with some minor gc).  They're using 4-5GB heap, and when
> we force gc using jvisualvm, they drop to 1GB usage.  Shard 1, however,
> quickly has a steep slope, and eventually has concurrent mode failures in
> the gc logs, requiring us to restart the instances when they can no longer
> do anything but gc.
>
> We've tried ruling out physical host problems by moving all 3 Shard 1
> replicas to different hosts that are underutilized, however we still get
> the same problem.  We'll still be working on ruling out infrastructure
> issues, but I wanted to ask the questions here in case it makes sense:
>
> * Does it make sense that all the replicas on one shard of a cluster would
> have heap problems, when the other shard replicas do not, assuming a fairly
> even data distribution?
> * One thing we changed recently was to make all of our fields stored,
> instead of only half of them.  This was to support atomic updates.  Can
> stored fields, even though lazily loaded, cause problems like this?
>
> Thanks for any input,
> Joe
>
>
>
>
>
> --
> I know what it is to be in need, and I know what it is to have plenty.  I
> have learned the secret of being content in any and every situation,
> whether well fed or hungry, whether living in plenty or in want.  I can do
> all this through him who gives me strength.*-Philippians 4:12-13*
>



-- 
I know what it is to be in need, and I know what it is to have plenty.  I
have learned the secret of being content in any and every situation,
whether well fed or hungry, whether living in plenty or in want.  I can do
all this through him who gives me strength.*-Philippians 4:12-13*


Re: Uneven shard heap usage

2014-05-31 Thread Jack Krupansky
When you restart, how long does it take it hit the problem? And how much 
query or update activity is happening in that time? Is there any other 
activity showing up in the log?


If you bring up only a single node in that problematic shard, do you still 
see the problem?


-- Jack Krupansky

-Original Message- 
From: Joe Gresock

Sent: Saturday, May 31, 2014 9:34 AM
To: solr-user@lucene.apache.org
Subject: Uneven shard heap usage

Hi folks,

I'm trying to figure out why one shard of an evenly-distributed 3-shard
cluster would suddenly start running out of heap space, after 9+ months of
stable performance.  We're using the "!" delimiter in our ids to distribute
the documents, and indeed the disk size of our shards are very similar
(31-32GB on disk per replica).

Our setup is:
9 VMs with 16GB RAM, 8 vcpus (with a 4:1 oversubscription ratio, so
basically 2 physical CPUs), 24GB disk
3 shards, 3 replicas per shard (1 leader, 2 replicas, whatever).  We
reserve 10g heap for each solr instance.
Also 3 zookeeper VMs, which are very stable

Since the troubles started, we've been monitoring all 9 with jvisualvm, and
shards 2 and 3 keep a steady amount of heap space reserved, always having
horizontal lines (with some minor gc).  They're using 4-5GB heap, and when
we force gc using jvisualvm, they drop to 1GB usage.  Shard 1, however,
quickly has a steep slope, and eventually has concurrent mode failures in
the gc logs, requiring us to restart the instances when they can no longer
do anything but gc.

We've tried ruling out physical host problems by moving all 3 Shard 1
replicas to different hosts that are underutilized, however we still get
the same problem.  We'll still be working on ruling out infrastructure
issues, but I wanted to ask the questions here in case it makes sense:

* Does it make sense that all the replicas on one shard of a cluster would
have heap problems, when the other shard replicas do not, assuming a fairly
even data distribution?
* One thing we changed recently was to make all of our fields stored,
instead of only half of them.  This was to support atomic updates.  Can
stored fields, even though lazily loaded, cause problems like this?

Thanks for any input,
Joe





--
I know what it is to be in need, and I know what it is to have plenty.  I
have learned the secret of being content in any and every situation,
whether well fed or hungry, whether living in plenty or in want.  I can do
all this through him who gives me strength.*-Philippians 4:12-13* 



Uneven shard heap usage

2014-05-31 Thread Joe Gresock
Hi folks,

I'm trying to figure out why one shard of an evenly-distributed 3-shard
cluster would suddenly start running out of heap space, after 9+ months of
stable performance.  We're using the "!" delimiter in our ids to distribute
the documents, and indeed the disk size of our shards are very similar
(31-32GB on disk per replica).

Our setup is:
9 VMs with 16GB RAM, 8 vcpus (with a 4:1 oversubscription ratio, so
basically 2 physical CPUs), 24GB disk
3 shards, 3 replicas per shard (1 leader, 2 replicas, whatever).  We
reserve 10g heap for each solr instance.
Also 3 zookeeper VMs, which are very stable

Since the troubles started, we've been monitoring all 9 with jvisualvm, and
shards 2 and 3 keep a steady amount of heap space reserved, always having
horizontal lines (with some minor gc).  They're using 4-5GB heap, and when
we force gc using jvisualvm, they drop to 1GB usage.  Shard 1, however,
quickly has a steep slope, and eventually has concurrent mode failures in
the gc logs, requiring us to restart the instances when they can no longer
do anything but gc.

We've tried ruling out physical host problems by moving all 3 Shard 1
replicas to different hosts that are underutilized, however we still get
the same problem.  We'll still be working on ruling out infrastructure
issues, but I wanted to ask the questions here in case it makes sense:

* Does it make sense that all the replicas on one shard of a cluster would
have heap problems, when the other shard replicas do not, assuming a fairly
even data distribution?
* One thing we changed recently was to make all of our fields stored,
instead of only half of them.  This was to support atomic updates.  Can
stored fields, even though lazily loaded, cause problems like this?

Thanks for any input,
Joe





-- 
I know what it is to be in need, and I know what it is to have plenty.  I
have learned the secret of being content in any and every situation,
whether well fed or hungry, whether living in plenty or in want.  I can do
all this through him who gives me strength.*-Philippians 4:12-13*


Re: DIH Issue

2014-05-31 Thread PeriS
db-import.xml:

 
 

schema.xml:
  
  
  person_id

Thanks
Peri


On May 31, 2014, at 8:05 AM, Ahmet Arslan  wrote:

> Hi,
> 
> Did you restart solr? Can you paste relevant portions of data-config.xml and 
> schema.xml?
> 
> 
> 
> On Saturday, May 31, 2014 3:46 AM, PeriS  wrote:
> I added the primaryKey as the  and still same result. 
> 
> On May 30, 2014, at 8:38 PM, Ahmet Arslan  wrote:
> 
>> Hi,
>> 
>> Sure, have a look at  definition in example schema.xml 
>> 
>> http://wiki.apache.org/solr/UniqueKey
>> 
>> 
>> 
>> 
>> On Saturday, May 31, 2014 3:35 AM, PeriS  wrote:
>> No. Is there a way to have the primary key of my entity be the unique key?
>> 
>> 
>> On May 30, 2014, at 7:00 PM, Ahmet Arslan  wrote:
>> 
>>> Hi,
>>> 
>>> Do you have uniqueKey defined in schema.xml ?
>>> 
>>> 
>>> 
>>> On Saturday, May 31, 2014 1:23 AM, PeriS  
>>> wrote:
>>> Hi,
>>> 
>>> I have followed the documentation to set up my delta query, but when I call 
>>> the delta-import, the index is happening again for the same  record and 
>>> ends up being indexed twice. Any clues please?
>>> 
>>> Thanks
>>> -Peri
>>> 
>>> 
>>> 
>>> 
>>> 
>>> *** DISCLAIMER *** This is a PRIVATE message. If you are not the intended 
>>> recipient, please delete without copying and kindly advise us by e-mail of 
>>> the mistake in delivery.
>>> NOTE: Regardless of content, this e-mail shall not operate to bind HTC 
>>> Global Services to any order or other contract unless pursuant to explicit 
>>> written agreement or government initiative expressly permitting the use of 
>>> e-mail for such purpose.
>>> 
>>> 
>> 
>>> Thank you,
>>> Peri Subrahmanya
>>> HTC Global Services 
>>> (KOLE)
>>> Cell: (+1) 618.407.3521
>>> Skype/Gtalk: peri.subrahmanya
>>>   
>>> *** DISCLAIMER *** This is a PRIVATE message. If you are not the intended 
>>> recipient, please delete without copying and kindly advise us by e-mail of 
>>> the mistake in delivery.
>>> NOTE: Regardless of content, this e-mail shall not operate to bind HTC 
>>> Global Services to any order or other contract unless pursuant to explicit 
>>> written agreement or government initiative expressly permitting the use of 
>>> e-mail for such purpose.
> 
> 
> 
>> 
>> 
>> 
>> 
>> 
>> 
>> *** DISCLAIMER *** This is a PRIVATE message. If you are not the intended 
>> recipient, please delete without copying and kindly advise us by e-mail of 
>> the mistake in delivery.
>> NOTE: Regardless of content, this e-mail shall not operate to bind HTC 
>> Global Services to any order or other contract unless pursuant to explicit 
>> written agreement or government initiative expressly permitting the use of 
>> e-mail for such purpose.
>> 
>> 
> 
>> Thank you,
>> Peri Subrahmanya
>> HTC Global Services 
>> (KOLE)
>> Cell: (+1) 618.407.3521
>> Skype/Gtalk: peri.subrahmanya
>>   
>> *** DISCLAIMER *** This is a PRIVATE message. If you are not the intended 
>> recipient, please delete without copying and kindly advise us by e-mail of 
>> the mistake in delivery.
>> NOTE: Regardless of content, this e-mail shall not operate to bind HTC 
>> Global Services to any order or other contract unless pursuant to explicit 
>> written agreement or government initiative expressly permitting the use of 
>> e-mail for such purpose.
> 
> 
> 
> *** DISCLAIMER *** This is a PRIVATE message. If you are not the intended 
> recipient, please delete without copying and kindly advise us by e-mail of 
> the mistake in delivery.
> NOTE: Regardless of content, this e-mail shall not operate to bind HTC Global 
> Services to any order or other contract unless pursuant to explicit written 
> agreement or government initiative expressly permitting the use of e-mail for 
> such purpose.
> 
> 




*** DISCLAIMER *** This is a PRIVATE message. If you are not the intended 
recipient, please delete without copying and kindly advise us by e-mail of the 
mistake in delivery.
NOTE: Regardless of content, this e-mail shall not operate to bind HTC Global 
Services to any order or other contract unless pursuant to explicit written 
agreement or government initiative expressly permitting the use of e-mail for 
such purpose.


Re: slow performance on simple filter

2014-05-31 Thread Yonik Seeley
On Sat, May 31, 2014 at 8:47 AM, mizayah  wrote:
> i show you my full query
>
> it's rly simple one
> q=*:* and fq=class_name:CdnFile
>
> debug q shows that process of q takes so long.
> single filter is critical here.

400ms is too long... something is strange.
One possibility is that the part of the index used to generate the
filter was not in OS cache and thus disk IO needed to be performed to
generate the filter.

-Yonik
http://heliosearch.org - facet functions, subfacets, off-heap filters&fieldcache


Re: slow performance on simple filter

2014-05-31 Thread mizayah
i show you my full query

it's rly simple one 
q=*:* and fq=class_name:CdnFile

debug q shows that process of q takes so long.
single filter is critical here.


And cache is ot an option here. It work well but i need to know why such
sipmple filter can takes so long.



class_name:CdnFile


421.0

1.0

1.0


0.0


0.0


0.0


0.0


0.0






420.0




415.0




--
View this message in context: 
http://lucene.472066.n3.nabble.com/slow-performance-on-simple-filter-tp4135613p4139050.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DIH Issue

2014-05-31 Thread Ahmet Arslan
Hi,

Did you restart solr? Can you paste relevant portions of data-config.xml and 
schema.xml?



On Saturday, May 31, 2014 3:46 AM, PeriS  wrote:
I added the primaryKey as the  and still same result. 

On May 30, 2014, at 8:38 PM, Ahmet Arslan  wrote:

> Hi,
> 
> Sure, have a look at  definition in example schema.xml 
> 
> http://wiki.apache.org/solr/UniqueKey
> 
> 
> 
> 
> On Saturday, May 31, 2014 3:35 AM, PeriS  wrote:
> No. Is there a way to have the primary key of my entity be the unique key?
> 
> 
> On May 30, 2014, at 7:00 PM, Ahmet Arslan  wrote:
> 
>> Hi,
>> 
>> Do you have uniqueKey defined in schema.xml ?
>> 
>> 
>> 
>> On Saturday, May 31, 2014 1:23 AM, PeriS  wrote:
>> Hi,
>> 
>> I have followed the documentation to set up my delta query, but when I call 
>> the delta-import, the index is happening again for the same  record and ends 
>> up being indexed twice. Any clues please?
>> 
>> Thanks
>> -Peri
>> 
>> 
>> 
>> 
>> 
>> *** DISCLAIMER *** This is a PRIVATE message. If you are not the intended 
>> recipient, please delete without copying and kindly advise us by e-mail of 
>> the mistake in delivery.
>> NOTE: Regardless of content, this e-mail shall not operate to bind HTC 
>> Global Services to any order or other contract unless pursuant to explicit 
>> written agreement or government initiative expressly permitting the use of 
>> e-mail for such purpose.
>> 
>> 
> 
>> Thank you,
>> Peri Subrahmanya
>> HTC Global Services 
>> (KOLE)
>> Cell: (+1) 618.407.3521
>> Skype/Gtalk: peri.subrahmanya
>>  
>> *** DISCLAIMER *** This is a PRIVATE message. If you are not the intended 
>> recipient, please delete without copying and kindly advise us by e-mail of 
>> the mistake in delivery.
>> NOTE: Regardless of content, this e-mail shall not operate to bind HTC 
>> Global Services to any order or other contract unless pursuant to explicit 
>> written agreement or government initiative expressly permitting the use of 
>> e-mail for such purpose.



> 
> 
> 
> 
> 
> 
> *** DISCLAIMER *** This is a PRIVATE message. If you are not the intended 
> recipient, please delete without copying and kindly advise us by e-mail of 
> the mistake in delivery.
> NOTE: Regardless of content, this e-mail shall not operate to bind HTC Global 
> Services to any order or other contract unless pursuant to explicit written 
> agreement or government initiative expressly permitting the use of e-mail for 
> such purpose.
> 
> 

> Thank you,
> Peri Subrahmanya
> HTC Global Services 
> (KOLE)
> Cell: (+1) 618.407.3521
> Skype/Gtalk: peri.subrahmanya
>  
> *** DISCLAIMER *** This is a PRIVATE message. If you are not the intended 
> recipient, please delete without copying and kindly advise us by e-mail of 
> the mistake in delivery.
> NOTE: Regardless of content, this e-mail shall not operate to bind HTC Global 
> Services to any order or other contract unless pursuant to explicit written 
> agreement or government initiative expressly permitting the use of e-mail for 
> such purpose.



*** DISCLAIMER *** This is a PRIVATE message. If you are not the intended 
recipient, please delete without copying and kindly advise us by e-mail of the 
mistake in delivery.
NOTE: Regardless of content, this e-mail shall not operate to bind HTC Global 
Services to any order or other contract unless pursuant to explicit written 
agreement or government initiative expressly permitting the use of e-mail for 
such purpose.