Questions about integrateing SolrCloud with HDFS

2013-12-23 Thread YouPeng Yang
Hi users

 Solr supports for writing and reading its index and transaction log files
to the HDFS distributed filesystem.
  **I am curious about that there are any other futher improvement about
the integration with HDFS.*
  **For the solr  native replication  will make multiple copies  of the
master node's index. Because of the native replication of HDFS,there is no
need to do that.It just to need that multiple cores in solrcloud share the
same index directory in HDFS?*


   The above supposition is what I want to achive when we are integrating
SolrCloud with HDFS (Solr 4.6).
   To make sure of our application high available,we still have  to take
the solr   replication with   some tricks.

   Firstly ,noting that  solr's index directory is made up of
*collectionName/coreNodeName/data/index *

*collectionName/coreNodeName/data/tlog*
   So to achive this,we want to create multi cores that use the same  hdfs
index directory .

  I have tested this  within solr 4.4 by expilcitly indicating  the same
coreNodeName.

  For example:
  Step1, a core was created with the name=core1 and shard=core_shard1 and
collection=clollection1 and coreNodeName=*core1*
  Step2. create another core  with the name=core2 and shard=core_shard1 and
collection=clollection1 and coreNodeName=
*core1*
*  T*he two core share the same shard ,collection and coreNodeName.As a
result,the two core will get the same index data which is stored in the
hdfs directory :
  hdfs://myhdfs/*clollection1*/*core1*/data/index
  hdfs://myhdfs/*clollection1*/*core1*/data/tlog

  Unfortunately*, *as the solr 4.6 was released,we upgraded . the above
goal failed. We could not create a core with both expilcit shard and
coreNodeName.
Exceptions are as [1].
*  Can some give some help?*


Regards
[1]--
64893635 [http-bio-8080-exec-1] INFO  org.apache.solr.cloud.ZkController
?.publishing core=hdfstest3 state=down
64893635 [http-bio-8080-exec-1] INFO  org.apache.solr.cloud.ZkController
?.numShards not found on descriptor - reading it from system property
64893698 [http-bio-8080-exec-1] INFO  org.apache.solr.cloud.ZkController
?.look for our core node name



64951227 [http-bio-8080-exec-17] INFO  org.apache.solr.core.SolrCore
?.[reportCore_201208] webapp=/solr path=/replication
params={slave=false&command=details&wt=javabin&qt=/replication&version=2}
status=0 QTime=107


65213770 [http-bio-8080-exec-1] INFO  org.apache.solr.cloud.ZkController
?.waiting to find shard id in clusterstate for hdfstest3
65533894 [http-bio-8080-exec-1] ERROR org.apache.solr.core.SolrCore
?.org.apache.solr.common.SolrException: Error CREATEing SolrCore
'hdfstest3': Could not get shard id for core: hdfstest3
at
org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:535)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:152)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:662)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:947)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1009)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.solr.common.SolrException: Could not get shard id for
core: hdfstest3
at
org.apache.solr.cloud.ZkController.waitForShardId(ZkController.java:1302)
at
org.apache.solr.cloud.ZkController.doGetShardIdAndNodeNameProcess(ZkController.java:1248)
at
org.apache.solr.cloud.ZkController.preRegister(ZkControll

Re: adding a node to SolrCloud

2013-12-23 Thread Shawn Heisey
On 12/23/2013 12:23 PM, David Santamauro wrote:
> I managed to create 8 new cores and the Solr Admin cloud page showed
> them wonderfully as active replicas.
> 
> The only issue I have is what goes into solr.xml (I'm using tomcat)?
> 
> Putting
>   
> 
> for each of the new cores I created seemed like the reasonable approach
> but when I tested a tomcat restart, the distribution was messed up ...
> for one thing, the cores on the new machine showed up as collections!
> And tomcat never even made it to accept connections for some reason.
> 
> I cleaned everything up with zookepper so my graph looks like it should
> and I removed that new machine from the distribution (by removing zk
> attributes) and restarted .. all is well again.
> 
> Any idea what could have went wrong on tomcat restart?

You may have one or more of the SolrCloud 'bootstrap' options on the
startup commandline.  The bootstrap options are intended to be used
once, in order to bootstrap from a non-SolrCloud setup to a SolrCloud setup.

Between the Collections API and the CoreAdmin API, you should never need
to edit solr.xml (if using the pre-4.4 format) or core.properties files
(if using core discovery, available 4.4 and later) directly.

Thanks,
Shawn



Re: Possible memory leak after segment merge? (related to DocValues?)

2013-12-23 Thread Joel Bernstein
Yeah, sounds like a leak might be there. Having the huge tlog might have
just magnified it's importance.

Joel Bernstein
Search Engineer at Heliosearch


On Mon, Dec 23, 2013 at 5:17 PM, Greg Preston wrote:

> Interesting.  In my original post, the memory growth (during restart)
> occurs after the tlog is done replaying, but during the merge.
>
> -Greg
>
>
> On Mon, Dec 23, 2013 at 2:06 PM, Joel Bernstein 
> wrote:
> > Greg,
> >
> > There is a memory component to the tlog, which supports realtime gets.
> This
> > memory component grows until there is a commit, so it will appear like a
> > leak. I suspect that replaying a tlog that was big enough to possibly
> cause
> > OOM is also problematic.
> >
> > One thing you might want to try is going to 15 second commits, and then
> > kill the Solr instance between the commits. Then watch the memory as the
> > replaying occurs with the smaller tlog.
> >
> > Joel
> >
> >
> >
> >
> > Joel Bernstein
> > Search Engineer at Heliosearch
> >
> >
> > On Mon, Dec 23, 2013 at 4:17 PM, Greg Preston <
> gpres...@marinsoftware.com>wrote:
> >
> >> Hi Joel,
> >>
> >> Thanks for the suggestion.  I could see how decreasing autoCommit time
> >> would reduce tlog size, and how that could possibly be related to the
> >> original OOM error.  I'm not seeing how that would make any difference
> >> once a tlog exists, though?
> >>
> >> I have a saved off copy of my data dir that has the 13G index and 2.5G
> >> tlog.  So I can reproduce the replay -> merge -> memory usage issue
> >> very quickly.  Changing the autoCommit to possibly avoid the initial
> >> OOM will take a good bit longer to try to reproduce.  I may try that
> >> later in the week.
> >>
> >> -Greg
> >>
> >>
> >> On Mon, Dec 23, 2013 at 12:20 PM, Joel Bernstein 
> >> wrote:
> >> > Hi Greg,
> >> >
> >> > I have a suspicion that the problem might be related or exacerbated be
> >> > overly large tlogs. Can you adjust your autoCommits to 15 seconds.
> Leave
> >> > openSearcher = false. I would remove the maxDoc as well. If you try
> >> > rerunning under those commit setting it's possible the OOM errors will
> >> stop
> >> > occurring.
> >> >
> >> > Joel
> >> >
> >> > Joel Bernstein
> >> > Search Engineer at Heliosearch
> >> >
> >> >
> >> > On Mon, Dec 23, 2013 at 3:07 PM, Greg Preston <
> >> gpres...@marinsoftware.com>wrote:
> >> >
> >> >> Hello,
> >> >>
> >> >> I'm loading up our solr cloud with data (from a solrj client) and
> >> >> running into a weird memory issue.  I can reliably reproduce the
> >> >> problem.
> >> >>
> >> >> - Using Solr Cloud 4.4.0 (also replicated with 4.6.0)
> >> >> - 24 solr nodes (one shard each), spread across 3 physical hosts,
> each
> >> >> host has 256G of memory
> >> >> - index and tlogs on ssd
> >> >> - Xmx=7G, G1GC
> >> >> - Java 1.7.0_25
> >> >> - schema and solrconfig.xml attached
> >> >>
> >> >> I'm using composite routing to route documents with the same clientId
> >> >> to the same shard.  After several hours of indexing, I occasionally
> >> >> see an IndexWriter go OOM.  I think that's a symptom.  When that
> >> >> happens, indexing continues, and that node's tlog starts to grow.
> >> >> When I notice this, I stop indexing, and bounce the problem node.
> >> >> That's where it gets interesting.
> >> >>
> >> >> Upon bouncing, the tlog replays, and then segments merge.  Once the
> >> >> merging is complete, the heap is fairly full, and forced full GC only
> >> >> helps a little.  But if I then bounce the node again, the heap usage
> >> >> goes way down, and stays low until the next segment merge.  I believe
> >> >> segment merges are also what causes the original OOM.
> >> >>
> >> >> More details:
> >> >>
> >> >> Index on disk for this node is ~13G, tlog is ~2.5G.
> >> >> See attached mem1.png.  This is a jconsole view of the heap during
> the
> >> >> following:
> >> >>
> >> >> (Solr cloud node started at the left edge of this graph)
> >> >>
> >> >> A) One CPU core pegged at 100%.  Thread dump shows:
> >> >> "Lucene Merge Thread #0" daemon prio=10 tid=0x7f5a3c064800
> >> >> nid=0x7a74 runnable [0x7f5a41c5f000]
> >> >>java.lang.Thread.State: RUNNABLE
> >> >> at org.apache.lucene.util.fst.Builder.add(Builder.java:397)
> >> >> at
> >> >>
> >>
> org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter.finishTerm(BlockTreeTermsWriter.java:1000)
> >> >> at
> >> >> org.apache.lucene.codecs.TermsConsumer.merge(TermsConsumer.java:112)
> >> >> at
> >> >> org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:72)
> >> >> at
> >> >>
> org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:365)
> >> >> at
> >> >> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:98)
> >> >> at
> >> >>
> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
> >> >> at
> >> org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
> >> >> at
> >> >>
> >>
> org.apache.lucene.index.

Re: Configurable collectors for custom ranking

2013-12-23 Thread Joel Bernstein
Peter,

You actually only need the current score being collected to be in the
request context. So you don't need a map, you just need an object wrapper
around a mutable float.

If you have a page size of X, only the top X scores need to be held onto,
because all the other scores wouldn't have made it into that page anyway so
they might as well be 0. Because the QueryResultCache caches's a larger
window then the page size you should keep enough scores so the cached
docList is correct. But if you're only dealing with 150K of results you
could just keep all the scores in a FloatArrayList and not worry about the
keeping the top X scores in a priority queue.

During the collect hang onto the docIds and scores and build your scaling
info.

During the finish iterate your docIds and scale the scores as you go.

Set your scaled score into the object wrapper that is in the request
context before you collect each document.

When you call collect on the delegate collectors they will call the custom
value source for each document to perform the sort. Your custom value
source will return whatever the float value is in the request context at
that time.

If you're also going to run this postfilter when you're doing a standard
rank by score you'll also need to send down a dummy scorer to the delegate
collectors. Spend some time with the CollapsingQParserPlugin in trunk to
see how the dummy scorer works.

I'll be adding value source collapse criteria to the
CollapsingQParserPlugin this week and it will have a similar interaction
between a PostFilter and value source. So you may want to watch SOLR-5536
to see an example of this.

Joel












Joel Bernstein
Search Engineer at Heliosearch


On Mon, Dec 23, 2013 at 4:03 PM, Peter Keegan wrote:

> Hi Joel,
>
> Could you clarify what would be in the key,value Map added to the
> SearchRequest context? It seems that all the docId/score tuples need to be
> there, including the ones not in the 'top N ScoreDocs' PriorityQueue
> (score=0). If so would the Map be something like:
> "scaled_scores",Map ?
>
> Also, what is the reason for passing score=0 for documents that aren't in
> the PriorityQueue? Will these docs get filtered out before a normal sort by
> score?
>
> Thanks,
> Peter
>
>
> On Thu, Dec 12, 2013 at 11:13 AM, Joel Bernstein 
> wrote:
>
> > The sorting is going to happen in the lower level collectors. You need a
> > value source that returns the score of the document being collected.
> >
> > Here is how you can make this happen:
> >
> > 1) Create an object in your PostFilter that simply holds the current
> score.
> > Place this object in the SearchRequest context map. Update object.score
> as
> > you pass the docs and scores to the lower collectors.
> >
> > 2) Create a values source that checks the SearchRequest context for the
> > object that's holding the current score. Use this object to return the
> > current score when called. For example if you give the value source a
> > handle called "score" a compound function call will look like this:
> > sum(score(), field(x))
> >
> > Joel
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Thu, Dec 12, 2013 at 9:58 AM, Peter Keegan  > >wrote:
> >
> > > Regarding my original goal, which is to perform a math function using
> the
> > > scaled score and a field value, and sort on the result, how does this
> fit
> > > in? Must I implement another custom PostFilter with a higher cost than
> > the
> > > scale PostFilter?
> > >
> > > Thanks,
> > > Peter
> > >
> > >
> > > On Wed, Dec 11, 2013 at 4:01 PM, Peter Keegan  > > >wrote:
> > >
> > > > Thanks very much for the guidance. I'd be happy to donate a working
> > > > solution.
> > > >
> > > > Peter
> > > >
> > > >
> > > > On Wed, Dec 11, 2013 at 3:53 PM, Joel Bernstein  > > >wrote:
> > > >
> > > >> SOLR-5020 has the commit info, it's mainly changes to
> > SolrIndexSearcher
> > > I
> > > >> believe. They might apply to 4.3.
> > > >> I think as long you have the finish method that's all you'll need.
> If
> > > you
> > > >> can get this working it would be excellent if you could donate back
> > the
> > > >> Scale PostFilter.
> > > >>
> > > >>
> > > >> On Wed, Dec 11, 2013 at 3:36 PM, Peter Keegan <
> peterlkee...@gmail.com
> > > >> >wrote:
> > > >>
> > > >> > This is what I was looking for, but the DelegatingCollector
> 'finish'
> > > >> method
> > > >> > doesn't exist in 4.3.0 :(   Can this be patched in and are there
> any
> > > >> other
> > > >> > PostFilter dependencies on 4.5?
> > > >> >
> > > >> > Thanks,
> > > >> > Peter
> > > >> >
> > > >> >
> > > >> > On Wed, Dec 11, 2013 at 3:16 PM, Joel Bernstein <
> joels...@gmail.com
> > >
> > > >> > wrote:
> > > >> >
> > > >> > > Here is one approach to use in a postfilter
> > > >> > >
> > > >> > > 1) In the collect() method call score for each doc. Use the
> scores
> > > to
> > > >> > > create your scaleInfo.
> > > >> > > 2) Keep a bitset of the hits and a priorityQueue of your top X
> > > >> ScoreDocs.
> > > >> > > 3) Don't delegate any doc

Re: adding a node to SolrCloud

2013-12-23 Thread Greg Preston
I believe you can just define multiple cores:



...

(this is the old style solr.xml.  I don't know how to do it in the newer style)

Also, make sure you don't define a non-relative  in
solrconfig.xml, or you may run into issues with cores trying to use
the same data dir.






-Greg


On Mon, Dec 23, 2013 at 2:16 PM, David Santamauro
 wrote:
> On 12/23/2013 05:03 PM, Greg Preston wrote:
>>>
>>> Yes, I'm well aware of the performance implications, many of which are
>>> mitigated by 2TB of SSD and 512GB RAM
>>
>>
>> I've got a very similar setup in production.  2TB SSD, 256G RAM (128G
>> heaps), and 1 - 1.5 TB of index per node.  We're in the process of
>> splitting that to multiple JVMs per host.  GC pauses were causing ZK
>> timeouts (you should up that in solr.xml).  And resync's after the
>> timeouts took long enough that a large tlog built up (we have near
>> continuous indexing), and we couldn't replay the tlog fast enough to
>> catch up to current.
>
>
> GC pauses are a huge issue in our current production environment (monolithic
> index) and general performance was meager, hence the move to a distributed
> design. We will have 8 nodes with ~ 200GB per node, one shard each and
> performance for single and most multi-term queries has become sub-second and
> throughput has increased 10-fold. Larger boolean queries can still take 2-3s
> but we can live with that.
>
> At any rate, I still can't figure out what my solr.xml is supposed to look
> like on the node with all 8 redundant shards.
>
> David
>
>
>
>> On Mon, Dec 23, 2013 at 2:31 AM, David Santamauro
>>  wrote:
>>>
>>> On 12/22/2013 09:48 PM, Shawn Heisey wrote:


 On 12/22/2013 2:10 PM, David Santamauro wrote:
>
>
> My goal is to have a redundant copy of all 8 currently running, but
> non-redundant shards. This setup (8 nodes with no replicas) was a test
> and it has proven quite functional from a performance perspective.
> Loading, though, takes almost 3 weeks so I'm really not in a position
> to
> redesign the distribution, though I can add nodes.
>
> I have acquired another resource, a very large machine that I'd like to
> use to hold the replicas of the currently deployed 8-nodes.
>
> I realize I can run 8 jetty/tomcats and accomplish my goal but that is
> a
> maintenance headache and is really a last resort. I really would just
> like to be able to deploy this big machine with 'numShards=8'.
>
> Is that possible or do I really need to have 8 other nodes running?



 You don't want to run more than one container or Solr instance per
 machine.  Things can get very confused, and it's too much overhead.
>>>
>>>


 With existing collections, you can simply run the CoreAdmin CREATE

 action on the new node with more resources.

 So you'd do something like this, once for each of the 8 existing parts:



 http://newnode:port/solr/admin/cores?action=CREATE&name=collname_shard1_replica2&collection=collname&shard=shard1

 It will automatically replicate the shard from its current leader.
>>>
>>>
>>>
>>> Fantastic! Clearly my understanding of "collection", vs "core" vs "shard"
>>> was lacking but now I see the relationship better.
>>>
>>>

 One thing to be aware of: With 1.4TB of index data, it might be
 impossible to keep enough of the index in RAM for good performance,
 unless the machine has a terabyte or more of RAM.
>>>
>>>
>>>
>>> Yes, I'm well aware of the performance implications, many of which are
>>> mitigated by 2TB of SSD and 512GB RAM.
>>>
>>> Thanks for the nudge in the right direction. The first node/shard1 is
>>> replicating right now.
>>>
>>> David
>>>
>>>
>>>
>


Re: Possible memory leak after segment merge? (related to DocValues?)

2013-12-23 Thread Greg Preston
Interesting.  In my original post, the memory growth (during restart)
occurs after the tlog is done replaying, but during the merge.

-Greg


On Mon, Dec 23, 2013 at 2:06 PM, Joel Bernstein  wrote:
> Greg,
>
> There is a memory component to the tlog, which supports realtime gets. This
> memory component grows until there is a commit, so it will appear like a
> leak. I suspect that replaying a tlog that was big enough to possibly cause
> OOM is also problematic.
>
> One thing you might want to try is going to 15 second commits, and then
> kill the Solr instance between the commits. Then watch the memory as the
> replaying occurs with the smaller tlog.
>
> Joel
>
>
>
>
> Joel Bernstein
> Search Engineer at Heliosearch
>
>
> On Mon, Dec 23, 2013 at 4:17 PM, Greg Preston 
> wrote:
>
>> Hi Joel,
>>
>> Thanks for the suggestion.  I could see how decreasing autoCommit time
>> would reduce tlog size, and how that could possibly be related to the
>> original OOM error.  I'm not seeing how that would make any difference
>> once a tlog exists, though?
>>
>> I have a saved off copy of my data dir that has the 13G index and 2.5G
>> tlog.  So I can reproduce the replay -> merge -> memory usage issue
>> very quickly.  Changing the autoCommit to possibly avoid the initial
>> OOM will take a good bit longer to try to reproduce.  I may try that
>> later in the week.
>>
>> -Greg
>>
>>
>> On Mon, Dec 23, 2013 at 12:20 PM, Joel Bernstein 
>> wrote:
>> > Hi Greg,
>> >
>> > I have a suspicion that the problem might be related or exacerbated be
>> > overly large tlogs. Can you adjust your autoCommits to 15 seconds. Leave
>> > openSearcher = false. I would remove the maxDoc as well. If you try
>> > rerunning under those commit setting it's possible the OOM errors will
>> stop
>> > occurring.
>> >
>> > Joel
>> >
>> > Joel Bernstein
>> > Search Engineer at Heliosearch
>> >
>> >
>> > On Mon, Dec 23, 2013 at 3:07 PM, Greg Preston <
>> gpres...@marinsoftware.com>wrote:
>> >
>> >> Hello,
>> >>
>> >> I'm loading up our solr cloud with data (from a solrj client) and
>> >> running into a weird memory issue.  I can reliably reproduce the
>> >> problem.
>> >>
>> >> - Using Solr Cloud 4.4.0 (also replicated with 4.6.0)
>> >> - 24 solr nodes (one shard each), spread across 3 physical hosts, each
>> >> host has 256G of memory
>> >> - index and tlogs on ssd
>> >> - Xmx=7G, G1GC
>> >> - Java 1.7.0_25
>> >> - schema and solrconfig.xml attached
>> >>
>> >> I'm using composite routing to route documents with the same clientId
>> >> to the same shard.  After several hours of indexing, I occasionally
>> >> see an IndexWriter go OOM.  I think that's a symptom.  When that
>> >> happens, indexing continues, and that node's tlog starts to grow.
>> >> When I notice this, I stop indexing, and bounce the problem node.
>> >> That's where it gets interesting.
>> >>
>> >> Upon bouncing, the tlog replays, and then segments merge.  Once the
>> >> merging is complete, the heap is fairly full, and forced full GC only
>> >> helps a little.  But if I then bounce the node again, the heap usage
>> >> goes way down, and stays low until the next segment merge.  I believe
>> >> segment merges are also what causes the original OOM.
>> >>
>> >> More details:
>> >>
>> >> Index on disk for this node is ~13G, tlog is ~2.5G.
>> >> See attached mem1.png.  This is a jconsole view of the heap during the
>> >> following:
>> >>
>> >> (Solr cloud node started at the left edge of this graph)
>> >>
>> >> A) One CPU core pegged at 100%.  Thread dump shows:
>> >> "Lucene Merge Thread #0" daemon prio=10 tid=0x7f5a3c064800
>> >> nid=0x7a74 runnable [0x7f5a41c5f000]
>> >>java.lang.Thread.State: RUNNABLE
>> >> at org.apache.lucene.util.fst.Builder.add(Builder.java:397)
>> >> at
>> >>
>> org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter.finishTerm(BlockTreeTermsWriter.java:1000)
>> >> at
>> >> org.apache.lucene.codecs.TermsConsumer.merge(TermsConsumer.java:112)
>> >> at
>> >> org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:72)
>> >> at
>> >> org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:365)
>> >> at
>> >> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:98)
>> >> at
>> >> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
>> >> at
>> org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
>> >> at
>> >>
>> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
>> >> at
>> >>
>> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
>> >>
>> >> B) One CPU core pegged at 100%.  Manually triggered GC.  Lots of
>> >> memory freed.  Thread dump shows:
>> >> "Lucene Merge Thread #0" daemon prio=10 tid=0x7f5a3c064800
>> >> nid=0x7a74 runnable [0x7f5a41c5f000]
>> >>java.lang.Thread.State: RUNNABLE
>> >> at
>> >>
>> org.

Re: adding a node to SolrCloud

2013-12-23 Thread David Santamauro

On 12/23/2013 05:03 PM, Greg Preston wrote:

Yes, I'm well aware of the performance implications, many of which are 
mitigated by 2TB of SSD and 512GB RAM


I've got a very similar setup in production.  2TB SSD, 256G RAM (128G
heaps), and 1 - 1.5 TB of index per node.  We're in the process of
splitting that to multiple JVMs per host.  GC pauses were causing ZK
timeouts (you should up that in solr.xml).  And resync's after the
timeouts took long enough that a large tlog built up (we have near
continuous indexing), and we couldn't replay the tlog fast enough to
catch up to current.


GC pauses are a huge issue in our current production environment 
(monolithic index) and general performance was meager, hence the move to 
a distributed design. We will have 8 nodes with ~ 200GB per node, one 
shard each and performance for single and most multi-term queries has 
become sub-second and throughput has increased 10-fold. Larger boolean 
queries can still take 2-3s but we can live with that.


At any rate, I still can't figure out what my solr.xml is supposed to 
look like on the node with all 8 redundant shards.


David



On Mon, Dec 23, 2013 at 2:31 AM, David Santamauro
 wrote:

On 12/22/2013 09:48 PM, Shawn Heisey wrote:


On 12/22/2013 2:10 PM, David Santamauro wrote:


My goal is to have a redundant copy of all 8 currently running, but
non-redundant shards. This setup (8 nodes with no replicas) was a test
and it has proven quite functional from a performance perspective.
Loading, though, takes almost 3 weeks so I'm really not in a position to
redesign the distribution, though I can add nodes.

I have acquired another resource, a very large machine that I'd like to
use to hold the replicas of the currently deployed 8-nodes.

I realize I can run 8 jetty/tomcats and accomplish my goal but that is a
maintenance headache and is really a last resort. I really would just
like to be able to deploy this big machine with 'numShards=8'.

Is that possible or do I really need to have 8 other nodes running?



You don't want to run more than one container or Solr instance per
machine.  Things can get very confused, and it's too much overhead.





With existing collections, you can simply run the CoreAdmin CREATE

action on the new node with more resources.

So you'd do something like this, once for each of the 8 existing parts:


http://newnode:port/solr/admin/cores?action=CREATE&name=collname_shard1_replica2&collection=collname&shard=shard1

It will automatically replicate the shard from its current leader.



Fantastic! Clearly my understanding of "collection", vs "core" vs "shard"
was lacking but now I see the relationship better.




One thing to be aware of: With 1.4TB of index data, it might be
impossible to keep enough of the index in RAM for good performance,
unless the machine has a terabyte or more of RAM.



Yes, I'm well aware of the performance implications, many of which are
mitigated by 2TB of SSD and 512GB RAM.

Thanks for the nudge in the right direction. The first node/shard1 is
replicating right now.

David







Re: Possible memory leak after segment merge? (related to DocValues?)

2013-12-23 Thread Joel Bernstein
Greg,

There is a memory component to the tlog, which supports realtime gets. This
memory component grows until there is a commit, so it will appear like a
leak. I suspect that replaying a tlog that was big enough to possibly cause
OOM is also problematic.

One thing you might want to try is going to 15 second commits, and then
kill the Solr instance between the commits. Then watch the memory as the
replaying occurs with the smaller tlog.

Joel




Joel Bernstein
Search Engineer at Heliosearch


On Mon, Dec 23, 2013 at 4:17 PM, Greg Preston wrote:

> Hi Joel,
>
> Thanks for the suggestion.  I could see how decreasing autoCommit time
> would reduce tlog size, and how that could possibly be related to the
> original OOM error.  I'm not seeing how that would make any difference
> once a tlog exists, though?
>
> I have a saved off copy of my data dir that has the 13G index and 2.5G
> tlog.  So I can reproduce the replay -> merge -> memory usage issue
> very quickly.  Changing the autoCommit to possibly avoid the initial
> OOM will take a good bit longer to try to reproduce.  I may try that
> later in the week.
>
> -Greg
>
>
> On Mon, Dec 23, 2013 at 12:20 PM, Joel Bernstein 
> wrote:
> > Hi Greg,
> >
> > I have a suspicion that the problem might be related or exacerbated be
> > overly large tlogs. Can you adjust your autoCommits to 15 seconds. Leave
> > openSearcher = false. I would remove the maxDoc as well. If you try
> > rerunning under those commit setting it's possible the OOM errors will
> stop
> > occurring.
> >
> > Joel
> >
> > Joel Bernstein
> > Search Engineer at Heliosearch
> >
> >
> > On Mon, Dec 23, 2013 at 3:07 PM, Greg Preston <
> gpres...@marinsoftware.com>wrote:
> >
> >> Hello,
> >>
> >> I'm loading up our solr cloud with data (from a solrj client) and
> >> running into a weird memory issue.  I can reliably reproduce the
> >> problem.
> >>
> >> - Using Solr Cloud 4.4.0 (also replicated with 4.6.0)
> >> - 24 solr nodes (one shard each), spread across 3 physical hosts, each
> >> host has 256G of memory
> >> - index and tlogs on ssd
> >> - Xmx=7G, G1GC
> >> - Java 1.7.0_25
> >> - schema and solrconfig.xml attached
> >>
> >> I'm using composite routing to route documents with the same clientId
> >> to the same shard.  After several hours of indexing, I occasionally
> >> see an IndexWriter go OOM.  I think that's a symptom.  When that
> >> happens, indexing continues, and that node's tlog starts to grow.
> >> When I notice this, I stop indexing, and bounce the problem node.
> >> That's where it gets interesting.
> >>
> >> Upon bouncing, the tlog replays, and then segments merge.  Once the
> >> merging is complete, the heap is fairly full, and forced full GC only
> >> helps a little.  But if I then bounce the node again, the heap usage
> >> goes way down, and stays low until the next segment merge.  I believe
> >> segment merges are also what causes the original OOM.
> >>
> >> More details:
> >>
> >> Index on disk for this node is ~13G, tlog is ~2.5G.
> >> See attached mem1.png.  This is a jconsole view of the heap during the
> >> following:
> >>
> >> (Solr cloud node started at the left edge of this graph)
> >>
> >> A) One CPU core pegged at 100%.  Thread dump shows:
> >> "Lucene Merge Thread #0" daemon prio=10 tid=0x7f5a3c064800
> >> nid=0x7a74 runnable [0x7f5a41c5f000]
> >>java.lang.Thread.State: RUNNABLE
> >> at org.apache.lucene.util.fst.Builder.add(Builder.java:397)
> >> at
> >>
> org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter.finishTerm(BlockTreeTermsWriter.java:1000)
> >> at
> >> org.apache.lucene.codecs.TermsConsumer.merge(TermsConsumer.java:112)
> >> at
> >> org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:72)
> >> at
> >> org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:365)
> >> at
> >> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:98)
> >> at
> >> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
> >> at
> org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
> >> at
> >>
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
> >> at
> >>
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
> >>
> >> B) One CPU core pegged at 100%.  Manually triggered GC.  Lots of
> >> memory freed.  Thread dump shows:
> >> "Lucene Merge Thread #0" daemon prio=10 tid=0x7f5a3c064800
> >> nid=0x7a74 runnable [0x7f5a41c5f000]
> >>java.lang.Thread.State: RUNNABLE
> >> at
> >>
> org.apache.lucene.codecs.DocValuesConsumer$1$1.hasNext(DocValuesConsumer.java:127)
> >> at
> >>
> org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:144)
> >> at
> >>
> org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.ja

Re: adding a node to SolrCloud

2013-12-23 Thread Greg Preston
>Yes, I'm well aware of the performance implications, many of which are 
>mitigated by 2TB of SSD and 512GB RAM

I've got a very similar setup in production.  2TB SSD, 256G RAM (128G
heaps), and 1 - 1.5 TB of index per node.  We're in the process of
splitting that to multiple JVMs per host.  GC pauses were causing ZK
timeouts (you should up that in solr.xml).  And resync's after the
timeouts took long enough that a large tlog built up (we have near
continuous indexing), and we couldn't replay the tlog fast enough to
catch up to current.

If you're going to have a mostly static index, then it may be less of an issue.

-Greg


On Mon, Dec 23, 2013 at 2:31 AM, David Santamauro
 wrote:
> On 12/22/2013 09:48 PM, Shawn Heisey wrote:
>>
>> On 12/22/2013 2:10 PM, David Santamauro wrote:
>>>
>>> My goal is to have a redundant copy of all 8 currently running, but
>>> non-redundant shards. This setup (8 nodes with no replicas) was a test
>>> and it has proven quite functional from a performance perspective.
>>> Loading, though, takes almost 3 weeks so I'm really not in a position to
>>> redesign the distribution, though I can add nodes.
>>>
>>> I have acquired another resource, a very large machine that I'd like to
>>> use to hold the replicas of the currently deployed 8-nodes.
>>>
>>> I realize I can run 8 jetty/tomcats and accomplish my goal but that is a
>>> maintenance headache and is really a last resort. I really would just
>>> like to be able to deploy this big machine with 'numShards=8'.
>>>
>>> Is that possible or do I really need to have 8 other nodes running?
>>
>>
>> You don't want to run more than one container or Solr instance per
>> machine.  Things can get very confused, and it's too much overhead.
>
>>
>>
>> With existing collections, you can simply run the CoreAdmin CREATE
>>
>> action on the new node with more resources.
>>
>> So you'd do something like this, once for each of the 8 existing parts:
>>
>>
>> http://newnode:port/solr/admin/cores?action=CREATE&name=collname_shard1_replica2&collection=collname&shard=shard1
>>
>> It will automatically replicate the shard from its current leader.
>
>
> Fantastic! Clearly my understanding of "collection", vs "core" vs "shard"
> was lacking but now I see the relationship better.
>
>
>>
>> One thing to be aware of: With 1.4TB of index data, it might be
>> impossible to keep enough of the index in RAM for good performance,
>> unless the machine has a terabyte or more of RAM.
>
>
> Yes, I'm well aware of the performance implications, many of which are
> mitigated by 2TB of SSD and 512GB RAM.
>
> Thanks for the nudge in the right direction. The first node/shard1 is
> replicating right now.
>
> David
>
>
>


Re: Possible memory leak after segment merge? (related to DocValues?)

2013-12-23 Thread Greg Preston
Hi Joel,

Thanks for the suggestion.  I could see how decreasing autoCommit time
would reduce tlog size, and how that could possibly be related to the
original OOM error.  I'm not seeing how that would make any difference
once a tlog exists, though?

I have a saved off copy of my data dir that has the 13G index and 2.5G
tlog.  So I can reproduce the replay -> merge -> memory usage issue
very quickly.  Changing the autoCommit to possibly avoid the initial
OOM will take a good bit longer to try to reproduce.  I may try that
later in the week.

-Greg


On Mon, Dec 23, 2013 at 12:20 PM, Joel Bernstein  wrote:
> Hi Greg,
>
> I have a suspicion that the problem might be related or exacerbated be
> overly large tlogs. Can you adjust your autoCommits to 15 seconds. Leave
> openSearcher = false. I would remove the maxDoc as well. If you try
> rerunning under those commit setting it's possible the OOM errors will stop
> occurring.
>
> Joel
>
> Joel Bernstein
> Search Engineer at Heliosearch
>
>
> On Mon, Dec 23, 2013 at 3:07 PM, Greg Preston 
> wrote:
>
>> Hello,
>>
>> I'm loading up our solr cloud with data (from a solrj client) and
>> running into a weird memory issue.  I can reliably reproduce the
>> problem.
>>
>> - Using Solr Cloud 4.4.0 (also replicated with 4.6.0)
>> - 24 solr nodes (one shard each), spread across 3 physical hosts, each
>> host has 256G of memory
>> - index and tlogs on ssd
>> - Xmx=7G, G1GC
>> - Java 1.7.0_25
>> - schema and solrconfig.xml attached
>>
>> I'm using composite routing to route documents with the same clientId
>> to the same shard.  After several hours of indexing, I occasionally
>> see an IndexWriter go OOM.  I think that's a symptom.  When that
>> happens, indexing continues, and that node's tlog starts to grow.
>> When I notice this, I stop indexing, and bounce the problem node.
>> That's where it gets interesting.
>>
>> Upon bouncing, the tlog replays, and then segments merge.  Once the
>> merging is complete, the heap is fairly full, and forced full GC only
>> helps a little.  But if I then bounce the node again, the heap usage
>> goes way down, and stays low until the next segment merge.  I believe
>> segment merges are also what causes the original OOM.
>>
>> More details:
>>
>> Index on disk for this node is ~13G, tlog is ~2.5G.
>> See attached mem1.png.  This is a jconsole view of the heap during the
>> following:
>>
>> (Solr cloud node started at the left edge of this graph)
>>
>> A) One CPU core pegged at 100%.  Thread dump shows:
>> "Lucene Merge Thread #0" daemon prio=10 tid=0x7f5a3c064800
>> nid=0x7a74 runnable [0x7f5a41c5f000]
>>java.lang.Thread.State: RUNNABLE
>> at org.apache.lucene.util.fst.Builder.add(Builder.java:397)
>> at
>> org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter.finishTerm(BlockTreeTermsWriter.java:1000)
>> at
>> org.apache.lucene.codecs.TermsConsumer.merge(TermsConsumer.java:112)
>> at
>> org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:72)
>> at
>> org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:365)
>> at
>> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:98)
>> at
>> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
>> at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
>> at
>> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
>> at
>> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
>>
>> B) One CPU core pegged at 100%.  Manually triggered GC.  Lots of
>> memory freed.  Thread dump shows:
>> "Lucene Merge Thread #0" daemon prio=10 tid=0x7f5a3c064800
>> nid=0x7a74 runnable [0x7f5a41c5f000]
>>java.lang.Thread.State: RUNNABLE
>> at
>> org.apache.lucene.codecs.DocValuesConsumer$1$1.hasNext(DocValuesConsumer.java:127)
>> at
>> org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:144)
>> at
>> org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:92)
>> at
>> org.apache.lucene.codecs.DocValuesConsumer.mergeNumericField(DocValuesConsumer.java:112)
>> at
>> org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:221)
>> at
>> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:119)
>> at
>> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
>> at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
>> at
>> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
>> at
>> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
>>
>> C) One CPU core pegged at 100%.  Manually triggered GC.  No memory
>> freed.  Thread dump shows:
>> "Lucene Merge

Re: Configurable collectors for custom ranking

2013-12-23 Thread Peter Keegan
Hi Joel,

Could you clarify what would be in the key,value Map added to the
SearchRequest context? It seems that all the docId/score tuples need to be
there, including the ones not in the 'top N ScoreDocs' PriorityQueue
(score=0). If so would the Map be something like:
"scaled_scores",Map ?

Also, what is the reason for passing score=0 for documents that aren't in
the PriorityQueue? Will these docs get filtered out before a normal sort by
score?

Thanks,
Peter


On Thu, Dec 12, 2013 at 11:13 AM, Joel Bernstein  wrote:

> The sorting is going to happen in the lower level collectors. You need a
> value source that returns the score of the document being collected.
>
> Here is how you can make this happen:
>
> 1) Create an object in your PostFilter that simply holds the current score.
> Place this object in the SearchRequest context map. Update object.score as
> you pass the docs and scores to the lower collectors.
>
> 2) Create a values source that checks the SearchRequest context for the
> object that's holding the current score. Use this object to return the
> current score when called. For example if you give the value source a
> handle called "score" a compound function call will look like this:
> sum(score(), field(x))
>
> Joel
>
>
>
>
>
>
>
>
>
>
> On Thu, Dec 12, 2013 at 9:58 AM, Peter Keegan  >wrote:
>
> > Regarding my original goal, which is to perform a math function using the
> > scaled score and a field value, and sort on the result, how does this fit
> > in? Must I implement another custom PostFilter with a higher cost than
> the
> > scale PostFilter?
> >
> > Thanks,
> > Peter
> >
> >
> > On Wed, Dec 11, 2013 at 4:01 PM, Peter Keegan  > >wrote:
> >
> > > Thanks very much for the guidance. I'd be happy to donate a working
> > > solution.
> > >
> > > Peter
> > >
> > >
> > > On Wed, Dec 11, 2013 at 3:53 PM, Joel Bernstein  > >wrote:
> > >
> > >> SOLR-5020 has the commit info, it's mainly changes to
> SolrIndexSearcher
> > I
> > >> believe. They might apply to 4.3.
> > >> I think as long you have the finish method that's all you'll need. If
> > you
> > >> can get this working it would be excellent if you could donate back
> the
> > >> Scale PostFilter.
> > >>
> > >>
> > >> On Wed, Dec 11, 2013 at 3:36 PM, Peter Keegan  > >> >wrote:
> > >>
> > >> > This is what I was looking for, but the DelegatingCollector 'finish'
> > >> method
> > >> > doesn't exist in 4.3.0 :(   Can this be patched in and are there any
> > >> other
> > >> > PostFilter dependencies on 4.5?
> > >> >
> > >> > Thanks,
> > >> > Peter
> > >> >
> > >> >
> > >> > On Wed, Dec 11, 2013 at 3:16 PM, Joel Bernstein  >
> > >> > wrote:
> > >> >
> > >> > > Here is one approach to use in a postfilter
> > >> > >
> > >> > > 1) In the collect() method call score for each doc. Use the scores
> > to
> > >> > > create your scaleInfo.
> > >> > > 2) Keep a bitset of the hits and a priorityQueue of your top X
> > >> ScoreDocs.
> > >> > > 3) Don't delegate any documents to lower collectors in the
> collect()
> > >> > > method.
> > >> > > 4) In the finish method create a score mapping (use the hppc
> > >> > > IntFloatOpenHashMap) with your top X docIds pointing to their
> score,
> > >> > using
> > >> > > the priorityQueue created in step 2. Then iterate the bitset (also
> > >> > created
> > >> > > in step 2) sending down each doc to the lower collectors,
> retrieving
> > >> and
> > >> > > scaling the score from the score map. If the document is not in
> the
> > >> score
> > >> > > map then send down 0.
> > >> > >
> > >> > > You'll have setup a dummy scorer to feed to lower collectors. The
> > >> > > CollapsingQParserPlugin has an example of how to do this.
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> > > On Wed, Dec 11, 2013 at 2:05 PM, Peter Keegan <
> > peterlkee...@gmail.com
> > >> > > >wrote:
> > >> > >
> > >> > > > Hi Joel,
> > >> > > >
> > >> > > > I thought about using a PostFilter, but the problem is that the
> > >> 'scale'
> > >> > > > function must be done after all matching docs have been scored
> but
> > >> > before
> > >> > > > adding them to the PriorityQueue that sorts just the rows to be
> > >> > returned.
> > >> > > > Doing the 'scale' function wrapped in a 'query' is proving to be
> > too
> > >> > slow
> > >> > > > when it visits every document in the index.
> > >> > > >
> > >> > > > In the Collector, I can see how to get the field values like
> this:
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> indexSearcher.getSchema().getField("field(myfield").getType().getValueSource(SchemaField,
> > >> > > > QParser).getValues()
> > >> > > >
> > >> > > > But, 'getValueSource' needs a QParser, which isn't available.
> > >> > > > And I can't create a QParser without a SolrQueryRequest, which
> > isn't
> > >> > > > available.
> > >> > > >
> > >> > > > Thanks,
> > >> > > > Peter
> > >> > > >
> > >> > > >
> > >> > > > On Wed, Dec 11, 2013 at 1:48 PM, Joel Bernstein <
> > joels...@gmail.com
> > >> >
> > >> > > > wrote:
> > >> > > >
> > >

Re: Possible memory leak after segment merge? (related to DocValues?)

2013-12-23 Thread Joel Bernstein
Hi Greg,

I have a suspicion that the problem might be related or exacerbated be
overly large tlogs. Can you adjust your autoCommits to 15 seconds. Leave
openSearcher = false. I would remove the maxDoc as well. If you try
rerunning under those commit setting it's possible the OOM errors will stop
occurring.

Joel

Joel Bernstein
Search Engineer at Heliosearch


On Mon, Dec 23, 2013 at 3:07 PM, Greg Preston wrote:

> Hello,
>
> I'm loading up our solr cloud with data (from a solrj client) and
> running into a weird memory issue.  I can reliably reproduce the
> problem.
>
> - Using Solr Cloud 4.4.0 (also replicated with 4.6.0)
> - 24 solr nodes (one shard each), spread across 3 physical hosts, each
> host has 256G of memory
> - index and tlogs on ssd
> - Xmx=7G, G1GC
> - Java 1.7.0_25
> - schema and solrconfig.xml attached
>
> I'm using composite routing to route documents with the same clientId
> to the same shard.  After several hours of indexing, I occasionally
> see an IndexWriter go OOM.  I think that's a symptom.  When that
> happens, indexing continues, and that node's tlog starts to grow.
> When I notice this, I stop indexing, and bounce the problem node.
> That's where it gets interesting.
>
> Upon bouncing, the tlog replays, and then segments merge.  Once the
> merging is complete, the heap is fairly full, and forced full GC only
> helps a little.  But if I then bounce the node again, the heap usage
> goes way down, and stays low until the next segment merge.  I believe
> segment merges are also what causes the original OOM.
>
> More details:
>
> Index on disk for this node is ~13G, tlog is ~2.5G.
> See attached mem1.png.  This is a jconsole view of the heap during the
> following:
>
> (Solr cloud node started at the left edge of this graph)
>
> A) One CPU core pegged at 100%.  Thread dump shows:
> "Lucene Merge Thread #0" daemon prio=10 tid=0x7f5a3c064800
> nid=0x7a74 runnable [0x7f5a41c5f000]
>java.lang.Thread.State: RUNNABLE
> at org.apache.lucene.util.fst.Builder.add(Builder.java:397)
> at
> org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter.finishTerm(BlockTreeTermsWriter.java:1000)
> at
> org.apache.lucene.codecs.TermsConsumer.merge(TermsConsumer.java:112)
> at
> org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:72)
> at
> org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:365)
> at
> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:98)
> at
> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
> at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
> at
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
> at
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
>
> B) One CPU core pegged at 100%.  Manually triggered GC.  Lots of
> memory freed.  Thread dump shows:
> "Lucene Merge Thread #0" daemon prio=10 tid=0x7f5a3c064800
> nid=0x7a74 runnable [0x7f5a41c5f000]
>java.lang.Thread.State: RUNNABLE
> at
> org.apache.lucene.codecs.DocValuesConsumer$1$1.hasNext(DocValuesConsumer.java:127)
> at
> org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:144)
> at
> org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:92)
> at
> org.apache.lucene.codecs.DocValuesConsumer.mergeNumericField(DocValuesConsumer.java:112)
> at
> org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:221)
> at
> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:119)
> at
> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
> at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
> at
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
> at
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
>
> C) One CPU core pegged at 100%.  Manually triggered GC.  No memory
> freed.  Thread dump shows:
> "Lucene Merge Thread #0" daemon prio=10 tid=0x7f5a3c064800
> nid=0x7a74 runnable [0x7f5a41c5f000]
>java.lang.Thread.State: RUNNABLE
> at
> org.apache.lucene.codecs.DocValuesConsumer$1$1.hasNext(DocValuesConsumer.java:127)
> at
> org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:108)
> at
> org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:92)
> at
> org.apache.lucene.codecs.DocValuesConsumer.mergeNumericField(DocValuesConsumer.java:112)
> at
> org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:221)
> at
> org.apache.lucene.index.SegmentMerger.m

Re: how to best convert some term in q to a fq

2013-12-23 Thread Joel Bernstein
I  would suggest handling this in the client. You could write custom Solr
code also but it would be more complicated because you'd be working with
Solr's API's.

Joel Bernstein
Search Engineer at Heliosearch


On Mon, Dec 23, 2013 at 2:36 PM, jmlucjav  wrote:

> Hi,
>
> I have this scenario that I think is no unusual: solr will get a user
> entered query string like 'apple pear france'.
>
> I need to do this: if any of the terms is a country, then change the query
> params to move that term to a fq, i.e:
> q=apple pear france
> to
> q=apple pear&fq=country:france
>
> What do you guys would be the best way to implement this?
> - custom searchcomponent or queryparser
> - servlet in same jetty as solr
> - client code
>
> To simplify, consider countries are just a single term.
>
> Any pointer to an example to base this on would be great. thanks
>


Possible memory leak after segment merge? (related to DocValues?)

2013-12-23 Thread Greg Preston
Hello,

I'm loading up our solr cloud with data (from a solrj client) and
running into a weird memory issue.  I can reliably reproduce the
problem.

- Using Solr Cloud 4.4.0 (also replicated with 4.6.0)
- 24 solr nodes (one shard each), spread across 3 physical hosts, each
host has 256G of memory
- index and tlogs on ssd
- Xmx=7G, G1GC
- Java 1.7.0_25
- schema and solrconfig.xml attached

I'm using composite routing to route documents with the same clientId
to the same shard.  After several hours of indexing, I occasionally
see an IndexWriter go OOM.  I think that's a symptom.  When that
happens, indexing continues, and that node's tlog starts to grow.
When I notice this, I stop indexing, and bounce the problem node.
That's where it gets interesting.

Upon bouncing, the tlog replays, and then segments merge.  Once the
merging is complete, the heap is fairly full, and forced full GC only
helps a little.  But if I then bounce the node again, the heap usage
goes way down, and stays low until the next segment merge.  I believe
segment merges are also what causes the original OOM.

More details:

Index on disk for this node is ~13G, tlog is ~2.5G.
See attached mem1.png.  This is a jconsole view of the heap during the
following:

(Solr cloud node started at the left edge of this graph)

A) One CPU core pegged at 100%.  Thread dump shows:
"Lucene Merge Thread #0" daemon prio=10 tid=0x7f5a3c064800
nid=0x7a74 runnable [0x7f5a41c5f000]
   java.lang.Thread.State: RUNNABLE
at org.apache.lucene.util.fst.Builder.add(Builder.java:397)
at 
org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter.finishTerm(BlockTreeTermsWriter.java:1000)
at org.apache.lucene.codecs.TermsConsumer.merge(TermsConsumer.java:112)
at org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:72)
at 
org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:365)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:98)
at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
at 
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)

B) One CPU core pegged at 100%.  Manually triggered GC.  Lots of
memory freed.  Thread dump shows:
"Lucene Merge Thread #0" daemon prio=10 tid=0x7f5a3c064800
nid=0x7a74 runnable [0x7f5a41c5f000]
   java.lang.Thread.State: RUNNABLE
at 
org.apache.lucene.codecs.DocValuesConsumer$1$1.hasNext(DocValuesConsumer.java:127)
at 
org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:144)
at 
org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:92)
at 
org.apache.lucene.codecs.DocValuesConsumer.mergeNumericField(DocValuesConsumer.java:112)
at 
org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:221)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:119)
at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
at 
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)

C) One CPU core pegged at 100%.  Manually triggered GC.  No memory
freed.  Thread dump shows:
"Lucene Merge Thread #0" daemon prio=10 tid=0x7f5a3c064800
nid=0x7a74 runnable [0x7f5a41c5f000]
   java.lang.Thread.State: RUNNABLE
at 
org.apache.lucene.codecs.DocValuesConsumer$1$1.hasNext(DocValuesConsumer.java:127)
at 
org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:108)
at 
org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:92)
at 
org.apache.lucene.codecs.DocValuesConsumer.mergeNumericField(DocValuesConsumer.java:112)
at 
org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:221)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:119)
at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
at 
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)

D) One CPU core pegged at 100%.  Thread dump shows:
"Lucene Merge Thread #0" daemon prio=10 tid=0x7f5a3c064800
nid=0x7a74 runnable [0x7f5a41c5f000]
   java.lang.Thread.State: RU

how to best convert some term in q to a fq

2013-12-23 Thread jmlucjav
Hi,

I have this scenario that I think is no unusual: solr will get a user
entered query string like 'apple pear france'.

I need to do this: if any of the terms is a country, then change the query
params to move that term to a fq, i.e:
q=apple pear france
to
q=apple pear&fq=country:france

What do you guys would be the best way to implement this?
- custom searchcomponent or queryparser
- servlet in same jetty as solr
- client code

To simplify, consider countries are just a single term.

Any pointer to an example to base this on would be great. thanks


Re: adding a node to SolrCloud

2013-12-23 Thread David Santamauro


Shawn,

I managed to create 8 new cores and the Solr Admin cloud page showed 
them wonderfully as active replicas.


The only issue I have is what goes into solr.xml (I'm using tomcat)?

Putting
  

for each of the new cores I created seemed like the reasonable approach 
but when I tested a tomcat restart, the distribution was messed up ... 
for one thing, the cores on the new machine showed up as collections! 
And tomcat never even made it to accept connections for some reason.


I cleaned everything up with zookepper so my graph looks like it should 
and I removed that new machine from the distribution (by removing zk 
attributes) and restarted .. all is well again.


Any idea what could have went wrong on tomcat restart?

thanks.




On 12/22/2013 09:48 PM, Shawn Heisey wrote:

On 12/22/2013 2:10 PM, David Santamauro wrote:

My goal is to have a redundant copy of all 8 currently running, but
non-redundant shards. This setup (8 nodes with no replicas) was a test
and it has proven quite functional from a performance perspective.
Loading, though, takes almost 3 weeks so I'm really not in a position to
redesign the distribution, though I can add nodes.

I have acquired another resource, a very large machine that I'd like to
use to hold the replicas of the currently deployed 8-nodes.

I realize I can run 8 jetty/tomcats and accomplish my goal but that is a
maintenance headache and is really a last resort. I really would just
like to be able to deploy this big machine with 'numShards=8'.

Is that possible or do I really need to have 8 other nodes running?


You don't want to run more than one container or Solr instance per
machine.  Things can get very confused, and it's too much overhead.
Also, you shouldn't start Solr with the numShards parameter on the
commandline.  That should be given when you create each collection.

With existing collections, you can simply run the CoreAdmin CREATE
action on the new node with more resources.

http://wiki.apache.org/solr/SolrCloud#Creating_cores_via_CoreAdmin

So you'd do something like this, once for each of the 8 existing parts:

http://newnode:port/solr/admin/cores?action=CREATE&name=collname_shard1_replica2&collection=collname&shard=shard1

It will automatically replicate the shard from its current leader.

One thing to be aware of: With 1.4TB of index data, it might be
impossible to keep enough of the index in RAM for good performance,
unless the machine has a terabyte or more of RAM.

http://wiki.apache.org/solr/SolrPerformanceProblems#OS_Disk_Cache

Thanks,
Shawn





Re: indexing delay due to zookeeper election

2013-12-23 Thread Christine Poerschke (BLOOMBERG/ LONDON)
Sure. https://issues.apache.org/jira/i#browse/SOLR-5577 filed. Thanks.

- Original Message -
From: solr-user@lucene.apache.org
To: Christine Poerschke (BLOOMBERG/ LONDON), solr-user@lucene.apache.org
At: Dec 23 2013 18:12:50

Interesting stuff! This is expected but not really something I have thought
about yet.

Can you file a JIRA issue? I think we want to try and tackle this with code.

We currently reject updates when we lose our connection to ZooKeeper. We
are pretty strict about this. I think you could reasonably be less strict
(eg not start rejecting updates for a few seconds).

- Mark


On Mon, Dec 23, 2013 at 12:49 PM, Christine Poerschke (BLOOMBERG/ LONDON) <
cpoersc...@bloomberg.net> wrote:

> Hello.
>
> The behaviour we observed was that a zookeeper election took about 2s plus
> 1.5s for reading the zoo_data snapshot. During this time solr tried to
> establish connections to any zookeeper in the ensemble but only succeeded
> once a leader was elected *and* that leader was done reading the snapshot.
> Solr document updates were slowed down during this time window.
>
> Is this expected to happen during and shortly after elections, that is
> zookeeper closing existing connections, rejecting new connections and thus
> stalling solr updates?
>
> Other than avoiding zookeeper elections, are there ways of reducing their
> impact on solr?
>
> Thanks,
>
> Christine
>
>
> zookeeper log extract
>
> 08:18:54,968 [QuorumCnxManager.java:762] Connection broken for id ...
> 08:18:56,916 [Leader.java:345] LEADING - LEADER ELECTION TOOK - 1941
> 08:18:56,918 [FileSnap.java:83] Reading snapshot ...
> ...
> 08:18:57,010 [NIOServerCnxnFactory.java:197] Accepted socket connection
> from ...
> 08:18:57,010 [NIOServerCnxn.java:354] Exception causing close of session
> 0x0 due to java.io.IOException: ZooKeeperServer not running
> 08:18:57,010 [NIOServerCnxn.java:1001] Closed socket connection for client
> ... (no session established for client)
> ...
> 08:18:58,496 [FileTxnSnapLog.java:240] Snapshotting: ... to ...
>
>
> solr log extract
>
> 08:18:54,968 [ClientCnxn.java:1085] Unable to read additional data from
> server sessionid ... likely server has closed socket, closing socket
> connection and attempting reconnect
> 08:18:55,068 [ConnectionManager.java:72] Watcher
> org.apache.solr.common.cloud.ConnectionManager@...
> name:ZooKeeperConnection Watcher:host1:port1,host2:port2,host3:port3,...
> got event WatchedEvent state:Disconnected type:None path:null path:null
> type:None
> 08:18:55,068 [ConnectionManager.java:132] zkClient has disconnected
> ...
> 08:18:55,961 [ClientCnxn.java:966] Opening socket connection to server ...
> 08:18:55,961 [ClientCnxn.java:849] Socket connection established to ...
> 08:18:55,962 [ClientCnxn.java:1085] Unable to read additional data from
> server sessionid ... likely server has closed socket, closing socket
> connection and attempting reconnect
> ...
> 08:18:56,714 [ClientCnxn.java:966] Opening socket connection to server ...
> 08:18:56,715 [ClientCnxn.java:849] Socket connection established to ...
> 08:18:56,715 [ClientCnxn.java:1085] Unable to read additional data from ...
> ...
> 08:18:57,640 [ClientCnxn.java:966] Opening socket connection to server ...
> 08:18:57,641 [ClientCnxn.java:849] Socket connection established to ...
> 08:18:57,641 [ClientCnxn.java:1085] Unable to read additional data from ...
> ...
> 08:18:58,352 [ClientCnxn.java:966] Opening socket connection to server ...
> 08:18:58,353 [ClientCnxn.java:849] Socket connection established to ...
> 08:18:58,353 [ClientCnxn.java:1085] Unable to read additional data from ...
> ...
> 08:18:58,749 [ClientCnxn.java:966] Opening socket connection to server ...
> 08:18:58,749 [ClientCnxn.java:849] Socket connection established to ...
> 08:18:58,751 [ClientCnxn.java:1207] Session establishment complete on
> server ... sessionid = ..., negotiated timeout = ...
> 08:18:58,751 ... [ConnectionManager.java:72] Watcher
> org.apache.solr.common.cloud.ConnectionManager@...
> name:ZooKeeperConnection
> Watcher:host1:port1,host2:port2,host3:port3,... got event WatchedEvent
> state:SyncConnected type:None path:null path:null type:None
>
>


-- 
- Mark



Re: indexing delay due to zookeeper election

2013-12-23 Thread Mark Miller
Interesting stuff! This is expected but not really something I have thought
about yet.

Can you file a JIRA issue? I think we want to try and tackle this with code.

We currently reject updates when we lose our connection to ZooKeeper. We
are pretty strict about this. I think you could reasonably be less strict
(eg not start rejecting updates for a few seconds).

- Mark


On Mon, Dec 23, 2013 at 12:49 PM, Christine Poerschke (BLOOMBERG/ LONDON) <
cpoersc...@bloomberg.net> wrote:

> Hello.
>
> The behaviour we observed was that a zookeeper election took about 2s plus
> 1.5s for reading the zoo_data snapshot. During this time solr tried to
> establish connections to any zookeeper in the ensemble but only succeeded
> once a leader was elected *and* that leader was done reading the snapshot.
> Solr document updates were slowed down during this time window.
>
> Is this expected to happen during and shortly after elections, that is
> zookeeper closing existing connections, rejecting new connections and thus
> stalling solr updates?
>
> Other than avoiding zookeeper elections, are there ways of reducing their
> impact on solr?
>
> Thanks,
>
> Christine
>
>
> zookeeper log extract
>
> 08:18:54,968 [QuorumCnxManager.java:762] Connection broken for id ...
> 08:18:56,916 [Leader.java:345] LEADING - LEADER ELECTION TOOK - 1941
> 08:18:56,918 [FileSnap.java:83] Reading snapshot ...
> ...
> 08:18:57,010 [NIOServerCnxnFactory.java:197] Accepted socket connection
> from ...
> 08:18:57,010 [NIOServerCnxn.java:354] Exception causing close of session
> 0x0 due to java.io.IOException: ZooKeeperServer not running
> 08:18:57,010 [NIOServerCnxn.java:1001] Closed socket connection for client
> ... (no session established for client)
> ...
> 08:18:58,496 [FileTxnSnapLog.java:240] Snapshotting: ... to ...
>
>
> solr log extract
>
> 08:18:54,968 [ClientCnxn.java:1085] Unable to read additional data from
> server sessionid ... likely server has closed socket, closing socket
> connection and attempting reconnect
> 08:18:55,068 [ConnectionManager.java:72] Watcher
> org.apache.solr.common.cloud.ConnectionManager@...
> name:ZooKeeperConnection Watcher:host1:port1,host2:port2,host3:port3,...
> got event WatchedEvent state:Disconnected type:None path:null path:null
> type:None
> 08:18:55,068 [ConnectionManager.java:132] zkClient has disconnected
> ...
> 08:18:55,961 [ClientCnxn.java:966] Opening socket connection to server ...
> 08:18:55,961 [ClientCnxn.java:849] Socket connection established to ...
> 08:18:55,962 [ClientCnxn.java:1085] Unable to read additional data from
> server sessionid ... likely server has closed socket, closing socket
> connection and attempting reconnect
> ...
> 08:18:56,714 [ClientCnxn.java:966] Opening socket connection to server ...
> 08:18:56,715 [ClientCnxn.java:849] Socket connection established to ...
> 08:18:56,715 [ClientCnxn.java:1085] Unable to read additional data from ...
> ...
> 08:18:57,640 [ClientCnxn.java:966] Opening socket connection to server ...
> 08:18:57,641 [ClientCnxn.java:849] Socket connection established to ...
> 08:18:57,641 [ClientCnxn.java:1085] Unable to read additional data from ...
> ...
> 08:18:58,352 [ClientCnxn.java:966] Opening socket connection to server ...
> 08:18:58,353 [ClientCnxn.java:849] Socket connection established to ...
> 08:18:58,353 [ClientCnxn.java:1085] Unable to read additional data from ...
> ...
> 08:18:58,749 [ClientCnxn.java:966] Opening socket connection to server ...
> 08:18:58,749 [ClientCnxn.java:849] Socket connection established to ...
> 08:18:58,751 [ClientCnxn.java:1207] Session establishment complete on
> server ... sessionid = ..., negotiated timeout = ...
> 08:18:58,751 ... [ConnectionManager.java:72] Watcher
> org.apache.solr.common.cloud.ConnectionManager@...
> name:ZooKeeperConnection
> Watcher:host1:port1,host2:port2,host3:port3,... got event WatchedEvent
> state:SyncConnected type:None path:null path:null type:None
>
>


-- 
- Mark


indexing delay due to zookeeper election

2013-12-23 Thread Christine Poerschke (BLOOMBERG/ LONDON)
Hello.

The behaviour we observed was that a zookeeper election took about 2s plus 1.5s 
for reading the zoo_data snapshot. During this time solr tried to establish 
connections to any zookeeper in the ensemble but only succeeded once a leader 
was elected *and* that leader was done reading the snapshot. Solr document 
updates were slowed down during this time window.

Is this expected to happen during and shortly after elections, that is 
zookeeper closing existing connections, rejecting new connections and thus 
stalling solr updates?

Other than avoiding zookeeper elections, are there ways of reducing their 
impact on solr?

Thanks,

Christine


zookeeper log extract

08:18:54,968 [QuorumCnxManager.java:762] Connection broken for id ...
08:18:56,916 [Leader.java:345] LEADING - LEADER ELECTION TOOK - 1941
08:18:56,918 [FileSnap.java:83] Reading snapshot ...
...
08:18:57,010 [NIOServerCnxnFactory.java:197] Accepted socket connection from ...
08:18:57,010 [NIOServerCnxn.java:354] Exception causing close of session 0x0 
due to java.io.IOException: ZooKeeperServer not running
08:18:57,010 [NIOServerCnxn.java:1001] Closed socket connection for client ... 
(no session established for client)
...
08:18:58,496 [FileTxnSnapLog.java:240] Snapshotting: ... to ...


solr log extract

08:18:54,968 [ClientCnxn.java:1085] Unable to read additional data from server 
sessionid ... likely server has closed socket, closing socket connection and 
attempting reconnect
08:18:55,068 [ConnectionManager.java:72] Watcher 
org.apache.solr.common.cloud.ConnectionManager@... name:ZooKeeperConnection 
Watcher:host1:port1,host2:port2,host3:port3,... got event WatchedEvent 
state:Disconnected type:None path:null path:null type:None
08:18:55,068 [ConnectionManager.java:132] zkClient has disconnected
...
08:18:55,961 [ClientCnxn.java:966] Opening socket connection to server ... 
08:18:55,961 [ClientCnxn.java:849] Socket connection established to ...
08:18:55,962 [ClientCnxn.java:1085] Unable to read additional data from server 
sessionid ... likely server has closed socket, closing socket connection and 
attempting reconnect
...
08:18:56,714 [ClientCnxn.java:966] Opening socket connection to server ...
08:18:56,715 [ClientCnxn.java:849] Socket connection established to ...
08:18:56,715 [ClientCnxn.java:1085] Unable to read additional data from ...
...
08:18:57,640 [ClientCnxn.java:966] Opening socket connection to server ...
08:18:57,641 [ClientCnxn.java:849] Socket connection established to ... 
08:18:57,641 [ClientCnxn.java:1085] Unable to read additional data from ...
...
08:18:58,352 [ClientCnxn.java:966] Opening socket connection to server ...
08:18:58,353 [ClientCnxn.java:849] Socket connection established to ... 
08:18:58,353 [ClientCnxn.java:1085] Unable to read additional data from ...
...
08:18:58,749 [ClientCnxn.java:966] Opening socket connection to server ...
08:18:58,749 [ClientCnxn.java:849] Socket connection established to ...
08:18:58,751 [ClientCnxn.java:1207] Session establishment complete on server 
... sessionid = ..., negotiated timeout = ...
08:18:58,751 ... [ConnectionManager.java:72] Watcher
org.apache.solr.common.cloud.ConnectionManager@... name:ZooKeeperConnection
Watcher:host1:port1,host2:port2,host3:port3,... got event WatchedEvent 
state:SyncConnected type:None path:null path:null type:None



Re: Failure initializing default system SSL context

2013-12-23 Thread Michael Della Bitta
Here's the Tomcat 6 SSL HOWTO:
http://tomcat.apache.org/tomcat-6.0-doc/ssl-howto.html

Generally, Tomcat expects a keystore password of "changeit" and a key
password that matches the keystore, unless you configure it otherwise. You
can use a different keystore password, but the key and keystore password
must be the same.

Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions  | g+:
plus.google.com/appinions
w: appinions.com 


On Fri, Dec 20, 2013 at 10:34 PM, Shawn Heisey  wrote:

> On 12/20/2013 1:39 PM, Patel, Pritesh wrote:
> > I am using solr-solr4.5.1.jar with httpclient 4.3.  I put these jars a
> lib folder within WEB-INF of the war file that I am creating.  I deploy the
> war to Tomcat 6.
> >
> > When I run the code, I get this error when I try to run a query to solr.
>  This works when I use the solr-solr-3.6.1.jar but doesn't with
> solr-solr-4.5.1.jar.
> >
> > What am I missing?
> >
> >
> > Exception
> > Dec 20, 2013 11:55:31 AM
> org.apache.solr.client.solrj.impl.HttpClientUtil createClient
> > INFO: Creating new http client,
> config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false
> > org.apache.http.conn.ssl.SSLInitializationException: Failure
> initializing default system SSL context
>
> 
>
> > Caused by: java.io.IOException: Keystore was tampered with, or password
> was incorrect
>
> 
>
> > Caused by: java.security.UnrecoverableKeyException: Password
> verification failed
>
> This is basically a Tomcat and/or Java issue.  Something in the system
> has told Tomcat that it should use an SSL context, but there's a problem
> with the Java keystore where it's trying to get the certificate for SSL.
>
> If you actually do intend to use SSL, you'll need to fix it.  If you
> don't, you'll need to turn it off.  I wish I knew enough about what to
> do, but I've barely ever touched Tomcat.
>
> Thanks,
> Shawn
>
>


question about synonymfilter

2013-12-23 Thread Giovanni Bricconi
hello

suppose I have this synonim
abxpower => abx power

and suppose you are indexing "abxpower pipp"

>From the analyzer I see that abxpower is splitted in two words, but the
second word "power" overlaps the next one
text raw_bytes keyword position start end type positionLength
abxpower [61 62 78 70 6f 77 65 72] false 1 0 8 word 1
pipp [70 69 70 70] false 2 9 14 word 1


   SF
  text raw_bytes positionLength type start end position keyword
abx [61 62 78] 1 SYNONYM 0 8 1 false
pipp [70 69 70 70] 1 word 9 14 2 falsepower [70 6f 77 65 72] 1 SYNONYM 9 14
2 false


Is this correct? I noticed that WordDelimitedFilter instead changes start
end and position. This is what appens for abx-power pippo

 WDF
  text raw_bytes start end type position positionLength
abx [61 62 78] 0 3 word 1 1power [70 6f 77 65 72] 4 9 word 2 1
pippo [70 69 70 70 6f] 10 15 word 3 1


Re: Importing from Multiple tables using Solr DIH

2013-12-23 Thread Ahmet Arslan
Hi

There are several ways to do it. One way is to create to two entities at the 
same level. Use its name to call it. 

Request : command=full-import&entity=messages_test



  
  
  


 

Other way : Make table name variable in data-config.xml.  And change/set it 
(BLOB_TEST or BLOB_TEST1) from request parameters.

http://wiki.apache.org/solr/DataImportHandler#Accessing_request_parameters





On Monday, December 23, 2013 5:40 AM, Souvik Chakraborty 
 wrote:
Hi all,



I have the below configuration which is working absolutely fine.





Data-config.xml:













  

                

                

    

                                

        

      

     







Now my requirement is that I have a similar table BLOB_TEST1 and have the
same fields as that of BLOB_TEST.

I wish to index it the same way as I have done it for BLOB_TEST.

Can't figure out how to accomplish that.

Any help would be highly appreciated.



-Souvik


Custom PostFilter

2013-12-23 Thread Zwer
Hi Guys,

According to the  article
   about Advanced
Filter Caching and few examples ho to implement custom PostFilter in Solr I
implemented my own class that extends ExtendedQueryBase and implements
PostFilter. 
All filtering functionality implemented in my own
org.apache.solr.search.DelegatingCollector in *collect* method.
During testing I found out performance hits, because *collect* method is
called to much times.

Here is the question: Whether it is possible to know about how much
documents were found before/in call *collect* method and return true if it
is number is bigger than some threshold?


Thanks in advance.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Custom-PostFilter-tp4107925.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Call to Solr via TCP

2013-12-23 Thread Zwer
Thank you, Guys all for the responces



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Call-to-Solr-via-TCP-tp4105932p4107921.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: adding a node to SolrCloud

2013-12-23 Thread David Santamauro

On 12/22/2013 09:48 PM, Shawn Heisey wrote:

On 12/22/2013 2:10 PM, David Santamauro wrote:

My goal is to have a redundant copy of all 8 currently running, but
non-redundant shards. This setup (8 nodes with no replicas) was a test
and it has proven quite functional from a performance perspective.
Loading, though, takes almost 3 weeks so I'm really not in a position to
redesign the distribution, though I can add nodes.

I have acquired another resource, a very large machine that I'd like to
use to hold the replicas of the currently deployed 8-nodes.

I realize I can run 8 jetty/tomcats and accomplish my goal but that is a
maintenance headache and is really a last resort. I really would just
like to be able to deploy this big machine with 'numShards=8'.

Is that possible or do I really need to have 8 other nodes running?


You don't want to run more than one container or Solr instance per
machine.  Things can get very confused, and it's too much overhead.

>

With existing collections, you can simply run the CoreAdmin CREATE
action on the new node with more resources.

So you'd do something like this, once for each of the 8 existing parts:

http://newnode:port/solr/admin/cores?action=CREATE&name=collname_shard1_replica2&collection=collname&shard=shard1

It will automatically replicate the shard from its current leader.


Fantastic! Clearly my understanding of "collection", vs "core" vs 
"shard" was lacking but now I see the relationship better.




One thing to be aware of: With 1.4TB of index data, it might be
impossible to keep enough of the index in RAM for good performance,
unless the machine has a terabyte or more of RAM.


Yes, I'm well aware of the performance implications, many of which are 
mitigated by 2TB of SSD and 512GB RAM.


Thanks for the nudge in the right direction. The first node/shard1 is 
replicating right now.


David





Re: Werid exception: SolrCore 'collection1' is not available due to init failure

2013-12-23 Thread Upayavira
*something* is still referring to collection1. Have you tried searching
through your SOLR_HOME dir for any references to collection1?

Upayavira

On Mon, Dec 23, 2013, at 08:44 AM, YouPeng Yang wrote:
> Hi users
> 
>  I get a very werid problem within solr 4.6
>  I  just  want to reload a core :
> http://10.7.23.125:8080/solr/admin/cores?action=RELOAD&core=reportCore_201210_r1
> 
>  However it give out an exception[1].As the exception the SolrCore
> 'collection1' does not exist. I create a default core not with the name
> 'collection1'.
> 
>  I am not clear about the exception.And  I thought that the exception
> should not come out. Does it need to make an addtional setup avoid this
> exception?
> 
> 
> [1]-
> 1016924 [http-bio-8080-exec-10] ERROR
> org.apache.solr.servlet.SolrDispatchFilter
> ?.null:org.apache.solr.common.SolrException: SolrCore 'collection1' is
> not
> available due to init failure: No such core: collection1
> at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:818)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:338)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
> at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
> at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
> at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
> at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
> at
> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:947)
> at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
> at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
> at
> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1009)
> at
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
> at
> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:312)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:722)
> Caused by: org.apache.solr.common.SolrException: No such core:
> collection1
> at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:675)
> at
> org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:717)
> at
> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:178)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> at
> org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:662)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
> ... 16 more


Werid exception: SolrCore 'collection1' is not available due to init failure

2013-12-23 Thread YouPeng Yang
Hi users

 I get a very werid problem within solr 4.6
 I  just  want to reload a core :
http://10.7.23.125:8080/solr/admin/cores?action=RELOAD&core=reportCore_201210_r1

 However it give out an exception[1].As the exception the SolrCore
'collection1' does not exist. I create a default core not with the name
'collection1'.

 I am not clear about the exception.And  I thought that the exception
should not come out. Does it need to make an addtional setup avoid this
exception?


[1]-
1016924 [http-bio-8080-exec-10] ERROR
org.apache.solr.servlet.SolrDispatchFilter
?.null:org.apache.solr.common.SolrException: SolrCore 'collection1' is not
available due to init failure: No such core: collection1
at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:818)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:338)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:947)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1009)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:312)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.solr.common.SolrException: No such core: collection1
at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:675)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:717)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:178)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:662)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
... 16 more