Re: how to get abortOnConfigurationError=false working

2013-01-18 Thread snake
If you look at the version info I posted it seems that it is a pretty old
version embedded in Coldfusion, 1.4 by the looks of it.

Regards
Russ Michaels
www.michaels.me.uk
www.cfmldeveloper.com - Free CFML hosting for developers
www.cfsearch.com - CF search engine
On Jan 17, 2013 9:33 PM, Alexandre Rafalovitch [via Lucene] 
ml-node+s472066n4034359...@n3.nabble.com wrote:

 Solr 4 most definitely ignores missing cores (just run into that
 accidentally again myself). So, if you start Solr and directory is
 missing,
 it will survive (but complain).

 The other problem is what happens when a customer deletes the account and
 the core directory disappears in a middle of open searcher. I would
 suggest
 some-sort of pre-delete trigger that hits Solr admin interface and unloads
 that core first.

 Regards,
Alex.

 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


 On Thu, Jan 17, 2013 at 4:03 PM, Yonik Seeley [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=4034359i=0
 wrote:

  On Thu, Jan 17, 2013 at 3:40 PM, snake [hidden 
  email]http://user/SendEmail.jtp?type=nodenode=4034359i=1
 wrote:
   Ok so is there any other to stop this problem I am having where any
 site
   can break solr by delering their collection?
   Seems odd everyone would vote to remove a feature that would make solr
  more
   stable.
 
  I agree.
 
  abortOnConfigurationError was more about a single core... if the core
  would still be loaded if there were config errors.
 
  There *should* be a way to still load other cores if one core has an
  error and is not loaded.  If there's not currently, then we should
  implement it.
 
  -Yonik
  http://lucidworks.com
 


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/how-to-get-abortOnConfigurationError-false-working-tp4034149p4034359.html
  To unsubscribe from how to get abortOnConfigurationError=false working, click
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4034149code=cnVzc0BtaWNoYWVscy5tZS51a3w0MDM0MTQ5fDEwMDg4NTg5MzM=
 .
 NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-get-abortOnConfigurationError-false-working-tp4034149p4034468.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: access matched token ids in the FacetComponent?

2013-01-18 Thread Dmitry Kan
Hello Mikhail,

 I have some relevant experience and ready to help...
Thanks a lot!

 ...but I can not get the core problem. Could you please expand the
description and/or provide a
 sample?

Sure. Among other fields not relevant to this discussion, we have two
fields: a text field (Contents; searchable tokenized and filtered field)
and a unique string field document id (or doc id for short). We search on
one field (Contents), but return facets on another: (doc id). It is
absolutely legit, that Solr simply counts the number of doc ids in the
result set and since there is only one doc id per document it sets the
facet count of that doc id to 1.

What we need FacetComponent to report instead is the count of special
matches inside the document (one figure per document id). The special
matches mean, that we actually require the count of the sentences inside
each document where the hits were found. By now I know, how to retrieve the
individual sentences using Solr Highlighter: with the help
of 
hl.fragmenter=regexhl.snippets=1hl.regex.pattern=[REGEX_PATTERN_TO_IDENTIFY_A_SENTENCE].

My thinking goes like this:

Implement a similar to functionality to the one of
org.apache.lucene.search.highlight.Highlighter in the FacetComponent to
perform counting of matches the way described above. Substitute per doc id
counts (1's) with the calculated sentence counts. Return the updated facet
results.

Regards,

Dmitry

On Tue, Jan 15, 2013 at 2:08 PM, Mikhail Khludnev 
mkhlud...@griddynamics.com wrote:

 Dmitry,

 I have some relevant experience and ready to help, but I can not get the
 core problem. Could you please expand the description and/or provide a
 sample?


 On Tue, Jan 15, 2013 at 11:01 AM, Dmitry Kan solrexp...@gmail.com wrote:

  Hello!
 
  Is there a simple way of accessing the matched token ids in the
  FacetComponent? The use case is to text search on one field and facet on
  another. And in the facet counts we want to see the text hit counts.
  Can it be done via some other component / approach?
 
  Any input is greatly appreciated.
 
  Dmitry
 



 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
  mkhlud...@griddynamics.com



Re: using PositionIncrementAttribute to increment certain term positions to large values

2013-01-18 Thread Dmitry Kan
Hi,

For the sake of story completeness, I was able to fix the highlighter to
work with the token matches that go beyond the length of the text field.
The solution was to mod on matched token positions, if they exceed the
length of the text.

Dmitry

On Thu, Dec 27, 2012 at 10:13 AM, Dmitry Kan solrexp...@gmail.com wrote:

 Hi,

 answering my own question for the records: the experiments show that the
 described functionality is achievable with the TokenFilter class
 implementation. The only caveat though, is that Highlighter component stops
 working properly, if the match position goes beyond the length of the text
 field.

 As for the performance, no major delays compared to the original proximity
 search implementation have been noticed.

 Best,

 Dmitry Kan


 On Wed, Dec 19, 2012 at 10:53 AM, Dmitry Kan solrexp...@gmail.com wrote:

 Dear list,

 We are currently evaluating proximity searches (term1 term2 ~slope) for
 a specific use case. In particular, each document contains artificial
 delimiter characters (one character between each pair of sentences in the
 text). Our goal is to hit the sentences individually for any proximity
 search and avoid sentence cross-boundary matches.

 We figured, that by using PositionIncrementAttribute as a field in the
 descendant of TokenFilter class it is possible to set a position
 increment of each artificial character (which is a term in Lucene / SOLR
 notation) to an arbitrarily large number. Thus any proximity searches with
 reasonably small slope values should automatically hit withing the sentence
 boundaries.

 Does this sound like a right way to tackle the problem? Are there any
 performance costs involved?

 Thanks in advance for any input,

 Dmitry Kan





Re: build CMIS compatible Solr

2013-01-18 Thread Upayavira
A colleague of mine when I was working for Sourcesense made a CMIS
plugin for Solr. It was one way, and we used it to index stuff out of
Alfresco into Solr. I can't search for it now, let me know if you can't
find it.

Upayavira

On Fri, Jan 18, 2013, at 05:35 AM, Nicholas Li wrote:
 I want to make something like Alfresco, but not having that many
 features.
 And I'd like to utilise the searching ability of Solr.
 
 On Fri, Jan 18, 2013 at 4:11 PM, Gora Mohanty g...@mimirtech.com wrote:
 
  On 18 January 2013 10:36, Nicholas Li nicholas...@yarris.com wrote:
   hi
  
   I am new to solr and I would like to use Solr as my document server, plus
   search engine. But solr is not CMIS compatible( While it shoud not be, as
   it is not build as a pure document management server).  In that sense, I
   would build another layer beyond Solr so that the exposed interface would
   be CMIS compatible.
  [...]
 
  May I ask why? Solr is designed to be a search engine,
  which is a very different beast from a document repository.
  In the open-source world, Alfresco ( http://www.alfresco.com/ )
  already exists, can index into Solr, and supports CMIS-based
  access.
 
  Regards,
  Gora
 


Re: Solr getting scores of multiple core queries?

2013-01-18 Thread denl0
K since I found out it didn't work I have merged my two cores back to 1. But
now the scores still don't work on the join?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-getting-scores-of-multiple-core-queries-tp4034088p4034481.html
Sent from the Solr - User mailing list archive at Nabble.com.


ConcurrentModificationException in Solr 3.6.1

2013-01-18 Thread mechravi25
Hi all,


I am using Solr 3.6.1 version. I am giving a set of requests to solr
simultaneously. When I check the log file, I noticed the below exception
stack trace


SEVERE: java.util.ConcurrentModificationException
at 
java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:761)
at java.util.LinkedList$ListItr.next(LinkedList.java:696)
at
org.apache.solr.highlight.SolrHighlighter.getHighlightFields(SolrHighlighter.java:106)
at
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:369)
at
org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:131)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:186)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

When I searched through the solr issues, I got the following two url's,

https://issues.apache.org/jira/browse/SOLR-2684
https://issues.apache.org/jira/browse/SOLR-3790

The stack trace given in the second url coincides with the one given above
so I have applied the code change as given in the below link
http://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java?r1=1229401r2=1231606diff_format=h

The first url's stack trace seems to be different.
I have two questions here. 1.) Please tell me why this exception stack trace
occurs 2.) IS there any other patch/solution available to overcome this
exception.
Please guide me.









--
View this message in context: 
http://lucene.472066.n3.nabble.com/ConcurrentModificationException-in-Solr-3-6-1-tp4034493.html
Sent from the Solr - User mailing list archive at Nabble.com.


ConcurrentModificationException in Solr 3.6.1

2013-01-18 Thread mechravi25
Hi all,


I am using Solr 3.6.1 version. I am giving a set of requests to solr
simultaneously. When I check the log file, I noticed the below exception
stack trace


SEVERE: java.util.ConcurrentModificationException
at 
java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:761)
at java.util.LinkedList$ListItr.next(LinkedList.java:696)
at
org.apache.solr.highlight.SolrHighlighter.getHighlightFields(SolrHighlighter.java:106)
at
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:369)
at
org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:131)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:186)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

When I searched through the solr issues, I got the following two url's,

https://issues.apache.org/jira/browse/SOLR-2684
https://issues.apache.org/jira/browse/SOLR-3790

The stack trace given in the second url coincides with the one given above
so I have applied the code change as given in the below link
http://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java?r1=1229401r2=1231606diff_format=h

The first url's stack trace seems to be different.
I have two questions here. 1.) Please tell me why this exception stack trace
occurs 2.) IS there any other patch/solution available to overcome this
exception.
Please guide me.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/ConcurrentModificationException-in-Solr-3-6-1-tp4034494.html
Sent from the Solr - User mailing list archive at Nabble.com.


ConcurrentModificationException in solr 3.6.1

2013-01-18 Thread mechravi25
Hi all,


I am using Solr 3.6.1 version. I am giving a set of requests to solr
simultaneously. When I check the log file, I noticed the below exception
stack trace


SEVERE: java.util.ConcurrentModificationException
 at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:761)
 at java.util.LinkedList$ListItr.next(LinkedList.java:696)
 at
org.apache.solr.highlight.SolrHighlighter.getHighlightFields(SolrHighlighter.java:106)
 at
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:369)
 at
org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:131)
 at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:186)
 at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
 at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
 at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
 at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
 at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
 at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
 at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
 at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
 at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
 at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
 at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
 at org.mortbay.jetty.Server.handle(Server.java:326)
 at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
 at
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
 at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
 at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
 at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
 at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

When I searched through the solr issues, I got the following two url's,

https://issues.apache.org/jira/browse/SOLR-2684
https://issues.apache.org/jira/browse/SOLR-3790

The stack trace given in the second url coincides with the one given above
so I have applied the code change as given in the below link
http://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java?r1=1229401r2=1231606diff_format=h

The first url's stack trace seems to be different.
I have two questions here. 1.) Please tell me why this exception stack trace
occurs 2.) IS there any other patch/solution available to overcome this
exception.
Please guide me.

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/ConcurrentModificationException-in-solr-3-6-1-tp4034495.html
Sent from the Solr - User mailing list archive at Nabble.com.


AW: ConcurrentModificationException in Solr 3.6.1

2013-01-18 Thread André Widhani
This should be fixed in 3.6.2 which is available since Dec 25.

From the release notes:

Fixed ConcurrentModificationException during highlighting, if all fields were 
requested.

André


Von: mechravi25 [mechrav...@yahoo.co.in]
Gesendet: Freitag, 18. Januar 2013 11:10
An: solr-user@lucene.apache.org
Betreff: ConcurrentModificationException in Solr 3.6.1

Hi all,


I am using Solr 3.6.1 version. I am giving a set of requests to solr
simultaneously. When I check the log file, I noticed the below exception
stack trace


SEVERE: java.util.ConcurrentModificationException
at 
java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:761)
at java.util.LinkedList$ListItr.next(LinkedList.java:696)
at
org.apache.solr.highlight.SolrHighlighter.getHighlightFields(SolrHighlighter.java:106)
at
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:369)
at
org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:131)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:186)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

When I searched through the solr issues, I got the following two url's,

https://issues.apache.org/jira/browse/SOLR-2684
https://issues.apache.org/jira/browse/SOLR-3790

The stack trace given in the second url coincides with the one given above
so I have applied the code change as given in the below link
http://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java?r1=1229401r2=1231606diff_format=h

The first url's stack trace seems to be different.
I have two questions here. 1.) Please tell me why this exception stack trace
occurs 2.) IS there any other patch/solution available to overcome this
exception.
Please guide me.









--
View this message in context: 
http://lucene.472066.n3.nabble.com/ConcurrentModificationException-in-Solr-3-6-1-tp4034493.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Large data importing getting rollback with solr

2013-01-18 Thread Gora Mohanty
On 18 January 2013 15:04, ashimbose ashimb...@gmail.com wrote:
 Hi Gora,

 Thank you for your reply again,

 Joining is not possible in my case. coz there is no relation between all
 tables. is there joining is possible without any relation in this solr case?

No, one needs some kind of a relationship to join.

 I really going through hard time.Its really tough me to do alternatives
 process also, coz I am very new in solr.

 My Requirement is I need all text, varchar or char type data from my data
 source sampleDB, Its huge dsta source, may be it will come in TB amd more
 than 1000 tables.

This will be quite difficult to benchmark, architect, and set up,
especially if you are not experienced with Solr. Again, it might
be best to re-examine your assumptions. Could you describe
what you are trying to do, rather than jumping to a solution?
What kind of search requirements do you have that need data
from a thousand tables per Solr document? How is it that the
data tables have no relationships, where presumably the fields
put into each Solr document will have something in common?
What kind of queries are you planning to run?

Even if you are certain that you want to go down this route, it
would make sense to approach this in a phased manner. Become
familiar with Solr indexing using just a few tables, then extend
that to more tables. Benchmark, and use the results to plan the
sort of architecture you will need to build.

 Which one is the batter solution
 1) Having multiple root entities and indexing each separately
 2) Having multiple data-import requestHandlers

 Can you give me example and procedure how to achieve those. What changes I
 need to have.
[...]

You should really try and get familiar with the basics of Solr
indexing first, but here is a brief outline:
* Each Solr DIH configuration file has only one document
  tag, but can have multiple data sources, multiple root entities,
  and nested entities. In your case, you could use multiple
  data sources to spread the load over multiple databases,
  each holding a subset of the tables. It looks like you are
  already using multiple root entities, as all your entities are
  distinct. Thus, instead of importing all the entities at once,
  which is what /dataimport?command=full-import does, you
  could import them in batches, e.g.,
  /dataimport?command=full-importentity=CUSTOMER
  would import only the CUSTOMER entity,
  /dataimport?command=full-importentity=CUSTOMERentity=SHOP
  would import the CUSTOMER and SHOP entities, and so on.

* Have never tried this, but one can set up multiple request handlers
  in solrconfig.xml for each DIH instance that one plans to run.
  These can run in parallel rather than the sequential indexing of
  root entities in a single DIH instance.

Regards,
Gora


RE: Field Collapsing - Anything in the works for multi-valued fields?

2013-01-18 Thread David Parks
If I understand the reading, you've suggested that I index the vendor names
as their own document (currently this is a multi-valued field of each
document).

Each such vendor document would just have a single valued 'name' field.

Each normal product document would contain a multi-valued field that is a
list of vendor document IDs and that we use to join the query results with
the vendor documents.

I presume this means that I would have some kind of dynamic field created
from the join which I could use as the 'group.field' value? 

I didn't quite follow the last point.



-Original Message-
From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] 
Sent: Friday, January 18, 2013 9:34 AM
To: solr-user@lucene.apache.org
Subject: Re: Field Collapsing - Anything in the works for multi-valued
fields?

Hi,

Instead of the multi-valued fields, would parent-child setup for you here?

See http://search-lucene.com/?q=solr+joinfc_type=wiki

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Thu, Jan 17, 2013 at 8:04 PM, David Parks davidpark...@yahoo.com wrote:

 The documents are individual products which come from 1 or more vendors.
 Example: a 'toy spiderman doll' is sold by 2 vendors, that is 1 document.
 Most fields are multi valued (short_description from each of the 2 
 vendors, long_description, product_name, vendor, etc. the same).

 I'd like to collapse on the vendor in an attempt to ensure that vast 
 collections of books, music, and movies, by just a few vendors, don't 
 overwhelm the results simply due to the fact that they have every 
 search term imaginable due to the sheer volume of books, CDs, and 
 DVDs, in relation to other product items.

 But in this case there is clearly 1...N vendors per document, solidly 
 a multi-valued field. And it's hard to put a maximum number of vendors 
 possible.

 Thanks,
 Dave


 -Original Message-
 From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com]
 Sent: Friday, January 18, 2013 2:32 AM
 To: solr-user
 Subject: Re: Field Collapsing - Anything in the works for multi-valued 
 fields?

 David,

 What's the documents and the field? It can help to suggest workaround.


 On Thu, Jan 17, 2013 at 5:51 PM, David Parks davidpark...@yahoo.com
 wrote:

  I want to configure Field Collapsing, but my target field is 
  multi-valued (e.g. the field I want to group on has a variable # of 
  entries per document, 1-N entries).
 
  I read on the wiki (http://wiki.apache.org/solr/FieldCollapsing) 
  that grouping doesn't support multi-valued fields yet.
 
  Anything in the works on that front by chance?  Any common work-arounds?
 
 
 


 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
  mkhlud...@griddynamics.com





Re: Large data importing getting rollback with solr

2013-01-18 Thread ashimbose
Hi Gora,

I will have a Archive Data as some format like relational, Provided by
client. Which may have some relation but I will not know. I have to make
index that data without restoring to my sql db. That means I have to read
the data from archive file directly and this will be full automated.

What ever data have in that archive file I need to get all, mean Select *
from Table is my required query only.
In that Archive file can have huge table and data but data source will be
only one.

Let me know if you need any other information to help me...

Thanks  Regards,
Ashim



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Large-data-importing-getting-rollback-with-solr-tp4034075p4034508.html
Sent from the Solr - User mailing list archive at Nabble.com.


indexed file size in doubled in Solr 3.6.1 slave after Replication

2013-01-18 Thread mechravi25
Hi all,

I am using Solr 3.6.1 version and have both master and slave instance. 
I noticed that whenever I make some configuration changes and then restart
the server(both master and slave instance) 
to replicate the changed index(after indexing in master) to slave.
When I perform the replication, I see that the data count between the master
and the slave remains the same but the size of the index files seems to be
doubled in the slave(400MB) when compared to master(200MB).

I tried refreshing the page then also it remains the same and also, I see
that the replication.properties gets created automaticaly to point to the
current index folder after few replications.
Any reason why the index file size gets doubled from master to slave. This
scenario occurs only when I do a configuration change in master and slave
and then replicate data from master to slave. To overcome this, I delete the
index folder in slave and then replicate, then the size remains the same
with both master and slave. After that ,the rest of the time, this scenario
does not occur. This only happens when replication is done the first time
after configuration changes.

Please tell me if there is any configuration changes that is needed to be
done to overcome this scenario or is there any other reason for the
occurence of this scenario?

Please guide me.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexed-file-size-in-doubled-in-Solr-3-6-1-slave-after-Replication-tp4034518.html
Sent from the Solr - User mailing list archive at Nabble.com.


Indexed File Size is doubled in Solr 3.6.1 Slave after Replication

2013-01-18 Thread mechravi25
Hi all,

I am using Solr 3.6.1 version and have both master and slave instance. 
I noticed that whenever I make some configuration changes and then restart
the server(both master and slave instance) 
to replicate the changed index(after indexing in master) to slave.
When I perform the replication, I see that the data count between the master
and the slave remains the same but the size of the index files seems to be
doubled in the slave(400MB) when compared to master(200MB).

I tried refreshing the page then also it remains the same and also, I see
that the replication.properties gets created automaticaly to point to the
current index folder after few replications.
Any reason why the index file size gets doubled from master to slave. This
scenario occurs only when I do a configuration change in master and slave
and then replicate data from master to slave. To overcome this, I delete the
index folder in slave and then replicate, then the size remains the same
with both master and slave. After that ,the rest of the time, this scenario
does not occur. This only happens when replication is done the first time
after configuration changes.

Please tell me if there is any configuration changes that is needed to be
done to overcome this scenario or is there any other reason for the
occurence of this scenario?

Please guide me.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexed-File-Size-is-doubled-in-Solr-3-6-1-Slave-after-Replication-tp4034519.html
Sent from the Solr - User mailing list archive at Nabble.com.


ConcurrentModificationException in Solr 3.6.1

2013-01-18 Thread mechravi25
Hi all,


I am using Solr 3.6.1 version. I am giving a set of requests to solr
simultaneously. When I check the log file, I noticed the below exception
stack trace


SEVERE: java.util.ConcurrentModificationException
 at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:761)
 at java.util.LinkedList$ListItr.next(LinkedList.java:696)
 at
org.apache.solr.highlight.SolrHighlighter.getHighlightFields(SolrHighlighter.java:106)
 at
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:369)
 at
org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:131)
 at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:186)
 at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
 at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
 at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
 at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
 at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
 at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
 at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
 at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
 at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
 at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
 at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
 at org.mortbay.jetty.Server.handle(Server.java:326)
 at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
 at
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
 at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
 at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
 at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
 at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

When I searched through the solr issues, I got the following two url's,

https://issues.apache.org/jira/browse/SOLR-2684
https://issues.apache.org/jira/browse/SOLR-3790

The stack trace given in the second url coincides with the one given above
so I have applied the code change as given in the below link
http://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java?r1=1229401r2=1231606diff_format=h

The first url's stack trace seems to be different.
I have two questions here. 1.) Please tell me why this exception stack trace
occurs 2.) IS there any other patch/solution available to overcome this
exception.
Please guide me.

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/ConcurrentModificationException-in-Solr-3-6-1-tp4034520.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: ConcurrentModificationException in Solr 3.6.1

2013-01-18 Thread Sandeep Mestry
Hi There, I think Andre has already guided you in your earlier mail..


This should be fixed in 3.6.2 which is available since Dec 25.

From the release notes:

Fixed ConcurrentModificationException during highlighting, if all fields
were requested.

André




Von: mechravi25 [mechrav...@yahoo.co.in]
Gesendet: Freitag, 18. Januar 2013 11:10
An: solr-user@lucene.apache.org
Betreff: ConcurrentModificationException in Solr 3.6.1


On 18 January 2013 12:01, mechravi25 mechrav...@yahoo.co.in wrote:

 Hi all,


 I am using Solr 3.6.1 version. I am giving a set of requests to solr
 simultaneously. When I check the log file, I noticed the below exception
 stack trace


 SEVERE: java.util.ConcurrentModificationException
  at
 java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:761)
  at java.util.LinkedList$ListItr.next(LinkedList.java:696)
  at

 org.apache.solr.highlight.SolrHighlighter.getHighlightFields(SolrHighlighter.java:106)
  at

 org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:369)
  at

 org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:131)
  at

 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:186)
  at

 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
  at

 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
  at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
  at

 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
  at
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
  at
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
  at
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
  at
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
  at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
  at

 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
  at

 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
  at
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
  at org.mortbay.jetty.Server.handle(Server.java:326)
  at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
  at

 org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
  at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
  at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
  at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
  at

 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
  at

 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

 When I searched through the solr issues, I got the following two url's,

 https://issues.apache.org/jira/browse/SOLR-2684
 https://issues.apache.org/jira/browse/SOLR-3790

 The stack trace given in the second url coincides with the one given above
 so I have applied the code change as given in the below link

 http://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java?r1=1229401r2=1231606diff_format=h

 The first url's stack trace seems to be different.
 I have two questions here. 1.) Please tell me why this exception stack
 trace
 occurs 2.) IS there any other patch/solution available to overcome this
 exception.
 Please guide me.

 Thanks



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/ConcurrentModificationException-in-Solr-3-6-1-tp4034520.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Solr load balancer

2013-01-18 Thread Phil Hoy
Hi,

I would like to experiment with some custom load balancers to help with query 
latency in the face of long gc pauses and the odd time-consuming query that we 
need to be able to support. At the moment setting the socket timeout via the 
HttpShardHandlerFactory does help, but of course it can only be set to a length 
of time as long as the most time consuming query we are likely to receive.

For example perhaps a load balancer that sends multiple queries concurrently to 
all/some replicas and only keeps the first response might be effective. Or 
maybe a load balancer which takes account of the frequency of timeouts would be 
able to recognize zombies more effectively.

To use alternative load balancer implementations cleanly and without having to 
hack solr directly, I would need to be able to make the existing 
LBHttpSolrServer and HttpShardHandlerFactory more amenable to extension, I can 
then override the default load balancer using solr's plugin mechanism.

So my question is, if I made a patch to make the load balancer more pluggable, 
is this something that would be acceptable and if so what do I do next?

Phil

__
brightsolid is used in this email to collectively mean brightsolid online 
innovation limited and its subsidiary companies brightsolid online publishing 
limited and brightsolid online technology limited.
findmypast.co.uk is a brand of brightsolid online publishing limited.
brightsolid online innovation limited, Gateway House, Luna Place, Dundee 
Technology Park, Dundee DD2 1TP.  Registered in Scotland No. SC274983.
brightsolid online publishing limited, The Glebe, 6 Chapel Place, Rivington 
Street, London EC2A 3DQ. Registered in England No. 04369607.
brightsolid online technology limited, Gateway House, Luna Place, Dundee 
Technology Park, Dundee DD2 1TP.  Registered in Scotland No. SC161678.

Email Disclaimer

This message is confidential and may contain privileged information. You should 
not disclose its contents to any other person. If you are not the intended 
recipient, please notify the sender named above immediately. It is expressly 
declared that this e-mail does not constitute nor form part of a contract or 
unilateral obligation. Opinions, conclusions and other information in this 
message that do not relate to the official business of brightsolid shall be 
understood as neither given nor endorsed by it.
__
This email has been scanned by the brightsolid Email Security System. Powered 
by MessageLabs
__

Re: access matched token ids in the FacetComponent?

2013-01-18 Thread Mikhail Khludnev
Dmitry,

It definitely seems like postptocessing highlighter's output. The also
approach is:
- limit number of occurrences of a word in a sentence to 1
- play with facet by function patch
https://issues.apache.org/jira/browse/SOLR-1581 accomplished by tf()
function.

It doesn't seem like much help.

On Fri, Jan 18, 2013 at 12:42 PM, Dmitry Kan solrexp...@gmail.com wrote:

 that we actually require the count of the sentences inside
 each document where the hits were found.





-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Re: How to round solr score ?

2013-01-18 Thread Gora Mohanty
On 18 January 2013 18:26, Gustav xbihy...@sharklasers.com wrote:
 I have to bump this... is it possible to do it (round solr's score) with any
 integrated query function??

Do not have a Solr index handy at the moment to check,
but it should be possible to do this with function queries.
Please see the rint() and query() function at
http://wiki.apache.org/solr/FunctionQuery

Regards,
Gora


Re: Logging wrong exception

2013-01-18 Thread Gora Mohanty
On 18 January 2013 18:34, Muhzin R muhsinlo...@gmail.com wrote:
 Hi all, I'm trying to set the value of a field in my schema to null.The
 solr throws the following exception .
 

[...]

This is the relevant part of the error:

 INFO  - 2013-01-18 18:13:35.409;
 org.apache.solr.update.processor.LogUpdateProcessor; [core0] webapp=/solr
 path=/update params={wt=javabinversion=2} {} 0 3
 ERROR - 2013-01-18 18:13:35.409; org.apache.solr.common.SolrException;
 org.apache.solr.common.SolrException: [doc=10] missing required field:
 countryId
[...]

 __
  Even though I'm trying to modify a field other than countryId. FYI i'm
 trying to do a partial update.
 The schema of countryId is as :
 
 field name=countryId type=int indexed=true stored=true required=
 true/
 
 Why is solr logging a me the wrong exception?

Please show us the code that triggers this exception. Seems
like you are trying to do an update without providing a value
for a required field.

If you are using Solr 4.0, here is how to update only a specific
in a document:
http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/

Regards,
Gora


Re: access matched token ids in the FacetComponent?

2013-01-18 Thread Dmitry Kan
Mikhail,

Do you say, that it is not possible to access the matched terms positions
in the FacetComponent? If that would be possible (somewhere in the
StandardFacetsAccumulator class, where docids are available), then by
knowing the matched term positions I can do some school simple math to
calculate the sentence counts per doc id.

Dmitry

On Fri, Jan 18, 2013 at 2:45 PM, Mikhail Khludnev 
mkhlud...@griddynamics.com wrote:

 Dmitry,

 It definitely seems like postptocessing highlighter's output. The also
 approach is:
 - limit number of occurrences of a word in a sentence to 1
 - play with facet by function patch
 https://issues.apache.org/jira/browse/SOLR-1581 accomplished by tf()
 function.

 It doesn't seem like much help.

 On Fri, Jan 18, 2013 at 12:42 PM, Dmitry Kan solrexp...@gmail.com wrote:

  that we actually require the count of the sentences inside
  each document where the hits were found.
 




 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
  mkhlud...@griddynamics.com



Re: How to round solr score ?

2013-01-18 Thread Gustav
Hey Gora, thanks for the fast answer!

I Had tried the rint(score) function before(it would be perfect in my case)
but it didnt work out, i guess it only works with indexed fields, so i got
the  sort param could not be parsed as a query, and is not a field that
exists in the index: rint(score) error,

And with the query() function i didnt got any successful result...

Im stuck in the same cenario as squaro. 

if two docs have score of 1.67989 and 1.6767, I would like to sort them by
price. 

My sort rules ae something like:
sort=score desc, price asc



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-round-solr-score-tp495198p4034551.html
Sent from the Solr - User mailing list archive at Nabble.com.


SOLR 4.x: multiterm phrase inside proximity searches possible?

2013-01-18 Thread Dmitry Kan
Hello!

Does SOLR 4.x support / is going to support the multi-term phrase search
inside proximity searches?

To illustrate, we would like the following to work:

\a b\ c~10

which would return hits with a b 10 tokens away from c in no particular
order.

It looks like https://issues.apache.org/jira/browse/LUCENE-2754 implements
what we need on the Lucene side.

Regards,

Dmitry


Re: Solr cache considerations

2013-01-18 Thread Tomás Fernández Löbbe
No, the fieldValueCache is not used for resolving queries. Only for
multi-token faceting and apparently for the stats component too. The
document cache maintains in memory the stored content of the fields you are
retrieving or highlighting on. It'll hit if the same document matches the
query multiple times and the same fields are requested, but as Eirck said,
it is important for cases when multiple components in the same request need
to access the same data.

I think soft committing every 10 minutes is totally fine, but you should
hard commit more often if you are going to be using transaction log.
openSearcher=false will essentially tell Solr not to open a new searcher
after the (hard) commit, so you won't see the new indexed data and caches
wont be flushed. openSearcher=false makes sense when you are using
hard-commits together with soft-commits, as the soft-commit is dealing
with opening/closing searchers, you don't need hard commits to do it.

Tomás


On Fri, Jan 18, 2013 at 2:20 AM, Isaac Hebsh isaac.he...@gmail.com wrote:

 Unfortunately, it seems (
 http://lucene.472066.n3.nabble.com/Nrt-and-caching-td3993612.html) that
 these caches are not per-segment. In this case, I want to (soft) commit
 less frequently. Am I right?

 Tomás, as the fieldValueCache is very similar to lucene's FieldCache, I
 guess it has a big contribution to standard (not only faceted) queries
 time. SolrWiki claims that it primarily used by faceting. What that says
 about complex textual queries?

 documentCache:
 Erick, After a query processing is finished, doesn't some documents stay in
 the documentCache? can't I use it to accelerate queries that should
 retrieve stored fields of documents? In this case, a big documentCache can
 hold more documents..

 About commit frequency:
 HardCommit: openSearch=false seems as a nice solution. Where can I read
 about this? (found nothing but one unexplained sentence in SolrWiki).
 SoftCommit: In my case, the required index freshness is 10 minutes. The
 plan to soft commit every 10 minutes is similar to storing all of the
 documents in a queue (outside to Solr), an indexing a bulk every 10
 minutes.

 Thanks.


 On Fri, Jan 18, 2013 at 2:15 AM, Tomás Fernández Löbbe 
 tomasflo...@gmail.com wrote:

  I think fieldValueCache is not per segment, only fieldCache is. However,
  unless I'm missing something, this cache is only used for faceting on
  multivalued fields
 
 
  On Thu, Jan 17, 2013 at 8:58 PM, Erick Erickson erickerick...@gmail.com
  wrote:
 
   filterCache: This is bounded by 1M * (maxDoc) / 8 * (num filters in
   cache). Notice the /8. This reflects the fact that the filters are
   represented by a bitset on the _internal_ Lucene ID. UniqueId has no
   bearing here whatsoever. This is, in a nutshell, why warming is
   required, the internal Lucene IDs may change. Note also that it's
   maxDoc, the internal arrays have holes for deleted documents.
  
   Note this is an _upper_ bound, if there are only a few docs that
   match, the size will be (num of matching docs) * sizeof(int)).
  
   fieldValueCache. I don't think so, although I'm a bit fuzzy on this.
   It depends on whether these are per-segment caches or not. Any per
   segment cache is still valid.
  
   Think of documentCache as intended to hold the stored fields while
   various components operate on it, thus avoiding repeatedly fetching
   the data from disk. It's _usually_ not too big a worry.
  
   About hard-commits once a day. That's _extremely_ long. Think instead
   of committing more frequently with openSearcher=false. If nothing
   else, you transaction log will grow lots and lots and lots. I'm
   thinking on the order of 15 minutes, or possibly even much less. With
   softCommits happening more often, maybe every 15 seconds. In fact, I'd
   start out with soft commits every 15 seconds and hard commits
   (openSearcher=false) every 5 minutes. The problem with hard commits
   being once a day is that, if for any reason the server is interrupted,
   on startup Solr will try to replay the entire transaction log to
   assure index integrity. Not to mention that your tlog will be huge.
   Not to mention that there is some memory usage for each document in
   the tlog. Hard commits roll over the tlog, flush the in-memory tlog
   pointers, close index segments, etc.
  
   Best
   Erick
  
   On Thu, Jan 17, 2013 at 1:29 PM, Isaac Hebsh isaac.he...@gmail.com
   wrote:
Hi,
   
I am going to build a big Solr (4.0?) index, which holds some dozens
 of
millions of documents. Each document has some dozens of fields, and
 one
   big
textual field.
The queries on the index are non-trivial, and a little-bit long
 (might
  be
hundreds of terms). No query is identical to another.
   
Now, I want to analyze the cache performance (before setting up the
  whole
environment), in order to estimate how much RAM will I need.
   
filterCache:
In my scenariom, every query has some filters. let's say 

Re: SOLR 4.x: multiterm phrase inside proximity searches possible?

2013-01-18 Thread Jack Krupansky
There is no regular expression support across terms in queries, just within 
a single term, which is what LUCENE-2754 does (SpanMultiTermQueryWrapper).


You can use the surround query parser to do span queries:
http://wiki.apache.org/solr/SurroundQueryParser

But, surround does not support regex terms, just wildcards.

-- Jack Krupansky

-Original Message- 
From: Dmitry Kan

Sent: Friday, January 18, 2013 8:59 AM
To: solr-user@lucene.apache.org
Subject: SOLR 4.x: multiterm phrase inside proximity searches possible?

Hello!

Does SOLR 4.x support / is going to support the multi-term phrase search
inside proximity searches?

To illustrate, we would like the following to work:

\a b\ c~10

which would return hits with a b 10 tokens away from c in no particular
order.

It looks like https://issues.apache.org/jira/browse/LUCENE-2754 implements
what we need on the Lucene side.

Regards,

Dmitry 



Re: SOLR 4.x: multiterm phrase inside proximity searches possible?

2013-01-18 Thread Dmitry Kan
Thanks, Jack.

I have been looking at SurroundQueryParser and came across an old thread
[1], mentioning two drawbacks: term analysis is left out and no default
search operator.
Do you know, if this is still true?

[1] http://search-lucene.com/m/94ONm1KRuAv1/

On Fri, Jan 18, 2013 at 4:38 PM, Jack Krupansky j...@basetechnology.comwrote:

 There is no regular expression support across terms in queries, just
 within a single term, which is what LUCENE-2754 does
 (SpanMultiTermQueryWrapper).

 You can use the surround query parser to do span queries:
 http://wiki.apache.org/solr/**SurroundQueryParserhttp://wiki.apache.org/solr/SurroundQueryParser

 But, surround does not support regex terms, just wildcards.

 -- Jack Krupansky

 -Original Message- From: Dmitry Kan
 Sent: Friday, January 18, 2013 8:59 AM
 To: solr-user@lucene.apache.org
 Subject: SOLR 4.x: multiterm phrase inside proximity searches possible?


 Hello!

 Does SOLR 4.x support / is going to support the multi-term phrase search
 inside proximity searches?

 To illustrate, we would like the following to work:

 \a b\ c~10

 which would return hits with a b 10 tokens away from c in no particular
 order.

 It looks like 
 https://issues.apache.org/**jira/browse/LUCENE-2754https://issues.apache.org/jira/browse/LUCENE-2754implements
 what we need on the Lucene side.

 Regards,

 Dmitry



Re: SOLR 4.x: multiterm phrase inside proximity searches possible?

2013-01-18 Thread Jack Krupansky

Unfortuntaely, yes.

-- Jack Krupansky

-Original Message- 
From: Dmitry Kan

Sent: Friday, January 18, 2013 9:42 AM
To: solr-user@lucene.apache.org
Subject: Re: SOLR 4.x: multiterm phrase inside proximity searches possible?

Thanks, Jack.

I have been looking at SurroundQueryParser and came across an old thread
[1], mentioning two drawbacks: term analysis is left out and no default
search operator.
Do you know, if this is still true?

[1] http://search-lucene.com/m/94ONm1KRuAv1/

On Fri, Jan 18, 2013 at 4:38 PM, Jack Krupansky 
j...@basetechnology.comwrote:



There is no regular expression support across terms in queries, just
within a single term, which is what LUCENE-2754 does
(SpanMultiTermQueryWrapper).

You can use the surround query parser to do span queries:
http://wiki.apache.org/solr/**SurroundQueryParserhttp://wiki.apache.org/solr/SurroundQueryParser

But, surround does not support regex terms, just wildcards.

-- Jack Krupansky

-Original Message- From: Dmitry Kan
Sent: Friday, January 18, 2013 8:59 AM
To: solr-user@lucene.apache.org
Subject: SOLR 4.x: multiterm phrase inside proximity searches possible?


Hello!

Does SOLR 4.x support / is going to support the multi-term phrase search
inside proximity searches?

To illustrate, we would like the following to work:

\a b\ c~10

which would return hits with a b 10 tokens away from c in no particular
order.

It looks like 
https://issues.apache.org/**jira/browse/LUCENE-2754https://issues.apache.org/jira/browse/LUCENE-2754implements

what we need on the Lucene side.

Regards,

Dmitry





Re: SOLR 4.x: multiterm phrase inside proximity searches possible?

2013-01-18 Thread Dmitry Kan
That is good to know, thanks!

Taking the alternative (LUCENE-2754): somehow applying the most recent
patch attached to the jira wasn't successful, not sure why. I'd guess,
since it patches Lucene, one would still need to wire this to Solr or am I
missing something?

Dmitry

On Fri, Jan 18, 2013 at 4:44 PM, Jack Krupansky j...@basetechnology.comwrote:

 Unfortuntaely, yes.


 -- Jack Krupansky

 -Original Message- From: Dmitry Kan
 Sent: Friday, January 18, 2013 9:42 AM
 To: solr-user@lucene.apache.org
 Subject: Re: SOLR 4.x: multiterm phrase inside proximity searches possible?


 Thanks, Jack.

 I have been looking at SurroundQueryParser and came across an old thread
 [1], mentioning two drawbacks: term analysis is left out and no default
 search operator.
 Do you know, if this is still true?

 [1] 
 http://search-lucene.com/m/**94ONm1KRuAv1/http://search-lucene.com/m/94ONm1KRuAv1/

 On Fri, Jan 18, 2013 at 4:38 PM, Jack Krupansky j...@basetechnology.com*
 *wrote:

  There is no regular expression support across terms in queries, just
 within a single term, which is what LUCENE-2754 does
 (SpanMultiTermQueryWrapper).

 You can use the surround query parser to do span queries:
 http://wiki.apache.org/solr/SurroundQueryParserhttp://wiki.apache.org/solr/**SurroundQueryParser
 http://**wiki.apache.org/solr/**SurroundQueryParserhttp://wiki.apache.org/solr/SurroundQueryParser
 


 But, surround does not support regex terms, just wildcards.

 -- Jack Krupansky

 -Original Message- From: Dmitry Kan
 Sent: Friday, January 18, 2013 8:59 AM
 To: solr-user@lucene.apache.org
 Subject: SOLR 4.x: multiterm phrase inside proximity searches possible?


 Hello!

 Does SOLR 4.x support / is going to support the multi-term phrase search
 inside proximity searches?

 To illustrate, we would like the following to work:

 \a b\ c~10

 which would return hits with a b 10 tokens away from c in no particular
 order.

 It looks like 
 https://issues.apache.org/jira/browse/LUCENE-2754https://issues.apache.org/**jira/browse/LUCENE-2754
 https:**//issues.apache.org/jira/**browse/LUCENE-2754https://issues.apache.org/jira/browse/LUCENE-2754
 implements

 what we need on the Lucene side.

 Regards,

 Dmitry





Re: SOLR 4.x: multiterm phrase inside proximity searches possible?

2013-01-18 Thread Jack Krupansky

LUCENE-2754 is already in Lucene 4.0 - SpanMultiTermQueryWrapper.

-- Jack Krupansky

-Original Message- 
From: Dmitry Kan

Sent: Friday, January 18, 2013 9:50 AM
To: solr-user@lucene.apache.org
Subject: Re: SOLR 4.x: multiterm phrase inside proximity searches possible?

That is good to know, thanks!

Taking the alternative (LUCENE-2754): somehow applying the most recent
patch attached to the jira wasn't successful, not sure why. I'd guess,
since it patches Lucene, one would still need to wire this to Solr or am I
missing something?

Dmitry

On Fri, Jan 18, 2013 at 4:44 PM, Jack Krupansky 
j...@basetechnology.comwrote:



Unfortuntaely, yes.


-- Jack Krupansky

-Original Message- From: Dmitry Kan
Sent: Friday, January 18, 2013 9:42 AM
To: solr-user@lucene.apache.org
Subject: Re: SOLR 4.x: multiterm phrase inside proximity searches 
possible?



Thanks, Jack.

I have been looking at SurroundQueryParser and came across an old thread
[1], mentioning two drawbacks: term analysis is left out and no default
search operator.
Do you know, if this is still true?

[1] 
http://search-lucene.com/m/**94ONm1KRuAv1/http://search-lucene.com/m/94ONm1KRuAv1/


On Fri, Jan 18, 2013 at 4:38 PM, Jack Krupansky j...@basetechnology.com*
*wrote:

 There is no regular expression support across terms in queries, just

within a single term, which is what LUCENE-2754 does
(SpanMultiTermQueryWrapper).

You can use the surround query parser to do span queries:
http://wiki.apache.org/solr/SurroundQueryParserhttp://wiki.apache.org/solr/**SurroundQueryParser
http://**wiki.apache.org/solr/**SurroundQueryParserhttp://wiki.apache.org/solr/SurroundQueryParser



But, surround does not support regex terms, just wildcards.

-- Jack Krupansky

-Original Message- From: Dmitry Kan
Sent: Friday, January 18, 2013 8:59 AM
To: solr-user@lucene.apache.org
Subject: SOLR 4.x: multiterm phrase inside proximity searches possible?


Hello!

Does SOLR 4.x support / is going to support the multi-term phrase search
inside proximity searches?

To illustrate, we would like the following to work:

\a b\ c~10

which would return hits with a b 10 tokens away from c in no particular
order.

It looks like 
https://issues.apache.org/jira/browse/LUCENE-2754https://issues.apache.org/**jira/browse/LUCENE-2754

https:**//issues.apache.org/jira/**browse/LUCENE-2754https://issues.apache.org/jira/browse/LUCENE-2754
implements

what we need on the Lucene side.

Regards,

Dmitry








Re: SOLR 4.x: multiterm phrase inside proximity searches possible?

2013-01-18 Thread Dmitry Kan
Yep, that's my issue: we still use solr 3.4.

On Fri, Jan 18, 2013 at 4:57 PM, Jack Krupansky j...@basetechnology.comwrote:

 LUCENE-2754 is already in Lucene 4.0 - SpanMultiTermQueryWrapper.


 -- Jack Krupansky

 -Original Message- From: Dmitry Kan
 Sent: Friday, January 18, 2013 9:50 AM

 To: solr-user@lucene.apache.org
 Subject: Re: SOLR 4.x: multiterm phrase inside proximity searches possible?

 That is good to know, thanks!

 Taking the alternative (LUCENE-2754): somehow applying the most recent
 patch attached to the jira wasn't successful, not sure why. I'd guess,
 since it patches Lucene, one would still need to wire this to Solr or am I
 missing something?

 Dmitry

 On Fri, Jan 18, 2013 at 4:44 PM, Jack Krupansky j...@basetechnology.com*
 *wrote:

  Unfortuntaely, yes.


 -- Jack Krupansky

 -Original Message- From: Dmitry Kan
 Sent: Friday, January 18, 2013 9:42 AM
 To: solr-user@lucene.apache.org
 Subject: Re: SOLR 4.x: multiterm phrase inside proximity searches
 possible?


 Thanks, Jack.

 I have been looking at SurroundQueryParser and came across an old thread
 [1], mentioning two drawbacks: term analysis is left out and no default
 search operator.
 Do you know, if this is still true?

 [1] 
 http://search-lucene.com/m/94ONm1KRuAv1/http://search-lucene.com/m/**94ONm1KRuAv1/
 http://search-**lucene.com/m/94ONm1KRuAv1/http://search-lucene.com/m/94ONm1KRuAv1/
 

 On Fri, Jan 18, 2013 at 4:38 PM, Jack Krupansky j...@basetechnology.com
 *
 *wrote:


  There is no regular expression support across terms in queries, just

 within a single term, which is what LUCENE-2754 does
 (SpanMultiTermQueryWrapper).

 You can use the surround query parser to do span queries:
 http://wiki.apache.org/solr/**SurroundQueryParserhttp://wiki.apache.org/solr/SurroundQueryParser
 http://**wiki.apache.org/solr/SurroundQueryParserhttp://wiki.apache.org/solr/**SurroundQueryParser
 
 http://**wiki.apache.org/**solr/**SurroundQueryParserhttp://wiki.apache.org/solr/**SurroundQueryParser
 htt**p://wiki.apache.org/solr/**SurroundQueryParserhttp://wiki.apache.org/solr/SurroundQueryParser
 

 


 But, surround does not support regex terms, just wildcards.

 -- Jack Krupansky

 -Original Message- From: Dmitry Kan
 Sent: Friday, January 18, 2013 8:59 AM
 To: solr-user@lucene.apache.org
 Subject: SOLR 4.x: multiterm phrase inside proximity searches possible?


 Hello!

 Does SOLR 4.x support / is going to support the multi-term phrase search
 inside proximity searches?

 To illustrate, we would like the following to work:

 \a b\ c~10

 which would return hits with a b 10 tokens away from c in no particular
 order.

 It looks like 
 https://issues.apache.org/**jira/browse/LUCENE-2754https://issues.apache.org/jira/browse/LUCENE-2754
 https:**//issues.apache.org/**jira/**browse/LUCENE-2754https://issues.apache.org/**jira/browse/LUCENE-2754
 
 https:**//issues.apache.org/**jira/**browse/LUCENE-2754http://issues.apache.org/jira/**browse/LUCENE-2754
 http**s://issues.apache.org/jira/**browse/LUCENE-2754https://issues.apache.org/jira/browse/LUCENE-2754
 

 implements

 what we need on the Lucene side.

 Regards,

 Dmitry







Re: Questions about boosting

2013-01-18 Thread Shawn Heisey

On 1/18/2013 12:32 AM, Mikhail Khludnev wrote:

Colleagues,
fwiw bq is a DisMax parser feature. Shawn, to approach the boosting syntax
with the standard parser you need something like q=foo:bar ip:sc^1000.
Specifying ^1000 in bq makes no sense ever. If you show query params and
debugQuery output, it would much easier for us to help you.
PS omitting termfreq's and positions doesn't impact query time boosing
ever. The closes caveat is that disabling norms indexing kills _index_ time
boosting.


Ah!  As soon as I changed to my edismax handler, suddenly it started 
working!  I was doing all my tests with /select.


Now to work out what the boost factor should be.  A value of 0.25 seems 
like it might produce good results. I was surprised, I thought it would 
require a higher value.  Makes me think that this is not a 
multiplicative boost.


Thanks,
Shawn



SOLR-1604

2013-01-18 Thread Dmitry Kan
Hello!

Is there some activity on SOLR-1604? Can one of the contributors answer two
simple questions?
https://issues.apache.org/jira/browse/SOLR-1604?focusedCommentId=13557053page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13557053

Regards,

Dmitry


Re: How to round solr score ?

2013-01-18 Thread Gora Mohanty
On 18 January 2013 19:18, Gustav xbihy...@sharklasers.com wrote:
 Hey Gora, thanks for the fast answer!

 I Had tried the rint(score) function before(it would be perfect in my case)
 but it didnt work out, i guess it only works with indexed fields, so i got
 the  sort param could not be parsed as a query, and is not a field that
 exists in the index: rint(score) error,

 And with the query() function i didnt got any successful result...

 Im stuck in the same cenario as squaro.

 if two docs have score of 1.67989 and 1.6767, I would like to sort them by
 price.

 My sort rules ae something like:
 sort=score desc, price asc

You have to use rint() in combination with query()

If I understand your requirements correctly, something
along the lines below should work:
http://localhost:8983/solr/select/?defType=funcq=rint(query({!v=text:term}))fl=score,*sort=score
desc,price asc
should work, where one is searching for term in the field text.
The score is displayed in the returned fields to demonstrate that
it has been rounded off.

Regards,
Gora


Re: Questions about boosting

2013-01-18 Thread Walter Underwood

On Jan 17, 2013, at 10:53 PM, Shawn Heisey wrote:

 On 1/17/2013 11:41 PM, Walter Underwood wrote:
 As I understand it, the bq parameter is a full Lucene query, but only used 
 for ranking, not for selection. This is the complement of fq.
 
 You can use weighting:  provider:fred^8
 
 This will be affected by idf, so providers with fewer matches will have 
 higher weight than those with more matches. This is a bother, but the 
 idf-free approach requires Solr 4.0.
 
 I am doing my testing on Solr 4.1, so if you can give me the syntax for that, 
 I would appreciate it.  My production indexes are 3.5, but once we are 
 confident with the 4.1 dev system, we'll upgrade.
 
 The provider field has omitTermFreqAndPositions=true defined, but the 
 fields that typically get searched don't omit anything, so IDF probably still 
 applies in the aggregate.
 
 On a related note, I have rather extreme length variation in my fields, so I 
 see quite a lot of weird results due to very short metadata.  Is there any 
 way to lessen the impact of lengthNorm without eliminating it entirely?  If 
 not, is there any way to eliminate lengthNorm without also disabling 
 index-time boosts?  At this moment I am not doing index-time boosting, but 
 business requirements may change that in the future.
 
 Thanks,
 Shawn

I was experimenting with a boost function like this:

if(exists(query(provider:fred)), 5, 1)

That gives a constant boost if the term exists in the field, none if it does 
not. If you pass the provider in as a separate URL param, you could use 
parameter substitution.

if(exists(query(provider:$provider)), 5, 1)

For length norms, you could try a different similarity class or write your own, 
changing Similarity.computeNorm().

wunder
--
Walter Underwood
wun...@wunderwood.org





n values in one fieldType

2013-01-18 Thread blopez
Hi guys,

I have some specific needs for an application. Each document (identified by
docId) has several items from the same type (each one of these items
contains 6 integer values). So each Solr doc has a docId and another
multiValued attribute.

fields
field name=docId type=int/
field name=item type=??? multiValued=true /
/fields

My problem is that I don't know what fieldType I should use to implement in
the 'item' attribute, because every input query will have the 6 integer
values I told you before, to recover the docs that contains EXACTLY the 6
values.

What do you think?

Borja.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/n-values-in-one-fieldType-tp4034552.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Questions about boosting

2013-01-18 Thread Shawn Heisey

On 1/18/2013 8:52 AM, Walter Underwood wrote:


On Jan 17, 2013, at 10:53 PM, Shawn Heisey wrote:


On 1/17/2013 11:41 PM, Walter Underwood wrote:

As I understand it, the bq parameter is a full Lucene query, but only used for 
ranking, not for selection. This is the complement of fq.

You can use weighting:  provider:fred^8

This will be affected by idf, so providers with fewer matches will have higher 
weight than those with more matches. This is a bother, but the idf-free 
approach requires Solr 4.0.


I am doing my testing on Solr 4.1, so if you can give me the syntax for that, I 
would appreciate it.  My production indexes are 3.5, but once we are confident 
with the 4.1 dev system, we'll upgrade.

The provider field has omitTermFreqAndPositions=true defined, but the fields 
that typically get searched don't omit anything, so IDF probably still applies in the 
aggregate.

On a related note, I have rather extreme length variation in my fields, so I 
see quite a lot of weird results due to very short metadata.  Is there any way 
to lessen the impact of lengthNorm without eliminating it entirely?  If not, is 
there any way to eliminate lengthNorm without also disabling index-time boosts? 
 At this moment I am not doing index-time boosting, but business requirements 
may change that in the future.

Thanks,
Shawn


I was experimenting with a boost function like this:

if(exists(query(provider:fred)), 5, 1)

That gives a constant boost if the term exists in the field, none if it does 
not. If you pass the provider in as a separate URL param, you could use 
parameter substitution.

if(exists(query(provider:$provider)), 5, 1)

For length norms, you could try a different similarity class or write your own, 
changing Similarity.computeNorm().


I tried a boost= parameter on my 4.1 server and got an error message in 
the response with the entire value of the parameter:


org.apache.solr.search.SyntaxError: Nested function query must use 
$param or {!v=value} forms. got 'if(exists(query(ip:sc)), 2, 1)'


I get a different error if I change it to bf instead of boost:

org.apache.solr.search.SyntaxError: Unexpected text after function: )

What am I doing wrong?

Thanks,
Shawn



Re: Need 'stupid beginner' help with SolrCloud

2013-01-18 Thread Shawn Heisey

On 1/17/2013 9:24 PM, Mark Miller wrote:

There are a couple ways you can proceed. You can preconfigure some SolrCores in 
solr.xml. Even if you don't, you want a solr.xml, because that is where a lot 
of cloud properties are defined. Or you can use the collections API or the core 
admin API.

I guess I'd recommend the collections API.

You have a couple options for getting in config. I'd recommend using the ZkCli 
tool to upload each of your config sets: 
http://wiki.apache.org/solr/SolrCloud#Getting_your_Configuration_Files_into_ZooKeeper

After that, use the collections API to create the necessary cores on each node.

Another options is to setup solr.xml like you would locally, then start with 
-Dconf_bootstrap=true and it will duplicate your local config and collection 
setup in ZooKeeper.


I have a cloud up and running with one config and collection, and I 
think I understand how to create more of both.


Is the following the recommended way of choosing a collection with 
SolrJ's CloudSolrServer, or should I be doing something different?


server.setDefaultCollection(test1);

Any SolrJ examples for cloud-related tasks would be appreciated.

Thanks,
Shawn



Re: Using Solr Spatial in conjunction with HBASE/Hadoop

2013-01-18 Thread oakstream
Thanks guys!
David,

In general and in your opinion would Lucene Spatial be the way to go to
index hundreds of terabytes of spatial data that continually grows.  Mostly
point data, mostly structured, however, could be polygons.  The searches
would be within or contains in a polygon.  
Do you have any thoughts on using a NOSQL database (like Mongodb) or
something else comparable.  I need response times in the seconds.  My
thoughts are that I need some type of distributed system.  I was thinking
about SOLRCLOUD to solve this.  I'm fairly new to Lucene/Solr.Most of
the data is currently in HDFS/HBASE.  

I've investigated sharding Oracle and Postgres databases but this just
doesn't seem like the ideal solution and since all the data already exists
in HDFS, I'd like to build a solution that works on top of it but
real-time or as near as I can get.  

Anyways, I've read some of your work in the past and appreciate your input.  
I don't mind putting in some development work, just not sure the right
approach. 

Thanks for your time. I appreciate it!




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-Solr-Spatial-in-conjunction-with-HBASE-Hadoop-tp4034307p4034639.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Need 'stupid beginner' help with SolrCloud

2013-01-18 Thread Mark Miller

On Jan 18, 2013, at 1:40 PM, Shawn Heisey s...@elyograg.org wrote:

 I have a cloud up and running with one config and collection, and I think I 
 understand how to create more of both.
 
 Is the following the recommended way of choosing a collection with SolrJ's 
 CloudSolrServer, or should I be doing something different?
 
 server.setDefaultCollection(test1);

Yeah, thats fine if you only plan on working with one collection. Otherwise 
just pass collection=whatever as a param to override.

 Any SolrJ examples for cloud-related tasks would be appreciated.

I'll look at adding some to the wiki.

- Mark




Re: Long ParNew GC pauses - even when young generation is small

2013-01-18 Thread giltene
 product bragging alert 

We've got people running Solr on the Zing JVM at various places for exactly
this reason. A key side effect of running on Zing is the complete
elimination of GC effects, with no code changes or tuning needed.
   
So instead of wanting pauses of half a second or less, and settling for
pauses of 2 seconds or less (per your message), you can instead actually run
on a JVM with a GC behavior that drops noise to below 20 msec with Solr. And
you can get this the day you turn Zing on, without needing to know any more
about tuning, or having to make the various interplays or tradeoffs around
things like Parnew sizing, heap sizing, occupancy thresholds, and the 75
other flags that may strongly affect your user experience.
 
-- Gil. (CTO, Azul Systems).



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Long-ParNew-GC-pauses-even-when-young-generation-is-small-tp4031110p4034645.html
Sent from the Solr - User mailing list archive at Nabble.com.


Question on Solr Velocity Example

2013-01-18 Thread O. Olson
Hi,
 
    I am new to
Solr (and Velocity), and have downloaded Solr 4.0 from 
http://lucene.apache.org/solr/downloads.html.
I started the example solr, and indexed the XML files in the /exampledocs
directory. Next, I pointed the browser to: http://localhost:8983/solr/browse 
and I get the results along with the search and faceted search functionality. I
am interested in learning how this example works. I hope some of you can help
me with the following questions:  
 
1.  In
this example, we seem to be using the Velocity templates in: 
/example/solr/collection1/conf/velocity.
The overall page at http://localhost:8983/solr/browse seems to be generated 
from browse.vm - which seems to include (parse) other
templates. My question here is that I see things like 
$response.response.clusters
– Where can I know what properties the “response” object has, or the “clusters”
object has? Also there seem to be some methods like display_facet_query() –
where is this defined. Is there some documentation for this, or some way I can
find this out? I might need to modify these values, hence my question. (I am
completely new to Velocity – but I think I get some idea by looking at the
templates.)
 
2.  In http://localhost:8983/solr/browse page, we have a list of Query 
Facets. Right now I just see two: ipod and GB?
How are these values obtained? Do they come from elevate.xml?? Here I see ipod,
but not GB. 
 
I would appreciate any help on these questions. If the above description is not 
clear please let me know.
 
Thank you,
O. O.


SolrCloud :: Distributed query processing

2013-01-18 Thread Mishkin, Ernest
Hello,

I'm trying to reconcile my understanding of how distributed queries are handled 
by SolrCloud with what I see in the server (tomcat running solr) logs.

The setup: Solr 4.0 GA, single collection, one shard, two nodes (master and 
replica), standalone zookeeper ensemble.

Client uses SolrJ CloudSolrServer to issue queries.

Looking at one of the solr instance's logs (same pattern for both master and 
replica) while repeatedly running a query, I sometimes see just one line:
INFO  [org.apache.solr.core.SolrCore   ] webapp=/solr path=/select 
params={start=0q=my_querywt=javabinrows=10version=2} hits=2 status=0 
QTime=28

And sometimes I see the following 3 lines (for a single client request):
INFO  [org.apache.solr.core.SolrCore   ] webapp=/solr path=/select 
params={fl=user_id,scoreshard.url=master_url|replica_url/NOW=1358538200298start=0q=my_querydistrib=falseisShard=truewt=javabinfsv=truerows=10version=2}
 hits=2 status=0 QTime=14
INFO  [org.apache.solr.core.SolrCore   ] webapp=/solr path=/select 
params={shard.url=master_url|replica_url/NOW=1358538200298start=0q=my_queryids=229,118671distrib=falseisShard=truewt=javabinrows=10version=2}
 status=0 QTime=9
INFO  [org.apache.solr.core.SolrCore   ] webapp=/solr path=/select 
params={start=0q=my_querywt=javabinrows=10version=2} hits=2 status=0 
QTime=107

I thought that the client simply picks a node (master or replica in this case) 
and that node will fully service the request given that it's a single shard 
setup. But apparently I'm missing something - please help me understand what.

Thanks,
Ernest




The information contained in this message is intended only for the recipient, 
and may be a confidential attorney-client communication or may otherwise be 
privileged and confidential and protected from disclosure. If the reader of 
this message is not the intended recipient, or an employee or agent responsible 
for delivering this message to the intended recipient, please be aware that any 
dissemination or copying of this communication is strictly prohibited. If you 
have received this communication in error, please immediately notify us by 
replying to the message and deleting it from your computer. The McGraw-Hill 
Companies, Inc. reserves the right, subject to applicable local law, to 
monitor, review and process the content of any electronic message or 
information sent to or from McGraw-Hill e-mail addresses without informing the 
sender or recipient of the message. By sending electronic message or 
information to McGraw-Hill e-mail addresses you, as the sender, are consenting 
to McGraw-Hill processing any of your personal data therein.


Re: SolrCloud :: Distributed query processing

2013-01-18 Thread Yonik Seeley
Hopefully the explanation here will shed some light on this:
https://issues.apache.org/jira/browse/SOLR-3912

-Yonik
http://lucidworks.com


On Fri, Jan 18, 2013 at 2:59 PM, Mishkin, Ernest
ernest_mish...@mcgraw-hill.com wrote:
 Hello,

 I'm trying to reconcile my understanding of how distributed queries are 
 handled by SolrCloud with what I see in the server (tomcat running solr) logs.

 The setup: Solr 4.0 GA, single collection, one shard, two nodes (master and 
 replica), standalone zookeeper ensemble.

 Client uses SolrJ CloudSolrServer to issue queries.

 Looking at one of the solr instance's logs (same pattern for both master and 
 replica) while repeatedly running a query, I sometimes see just one line:
 INFO  [org.apache.solr.core.SolrCore   ] webapp=/solr path=/select 
 params={start=0q=my_querywt=javabinrows=10version=2} hits=2 status=0 
 QTime=28

 And sometimes I see the following 3 lines (for a single client request):
 INFO  [org.apache.solr.core.SolrCore   ] webapp=/solr path=/select 
 params={fl=user_id,scoreshard.url=master_url|replica_url/NOW=1358538200298start=0q=my_querydistrib=falseisShard=truewt=javabinfsv=truerows=10version=2}
  hits=2 status=0 QTime=14
 INFO  [org.apache.solr.core.SolrCore   ] webapp=/solr path=/select 
 params={shard.url=master_url|replica_url/NOW=1358538200298start=0q=my_queryids=229,118671distrib=falseisShard=truewt=javabinrows=10version=2}
  status=0 QTime=9
 INFO  [org.apache.solr.core.SolrCore   ] webapp=/solr path=/select 
 params={start=0q=my_querywt=javabinrows=10version=2} hits=2 status=0 
 QTime=107

 I thought that the client simply picks a node (master or replica in this 
 case) and that node will fully service the request given that it's a single 
 shard setup. But apparently I'm missing something - please help me understand 
 what.

 Thanks,
 Ernest



 
 The information contained in this message is intended only for the recipient, 
 and may be a confidential attorney-client communication or may otherwise be 
 privileged and confidential and protected from disclosure. If the reader of 
 this message is not the intended recipient, or an employee or agent 
 responsible for delivering this message to the intended recipient, please be 
 aware that any dissemination or copying of this communication is strictly 
 prohibited. If you have received this communication in error, please 
 immediately notify us by replying to the message and deleting it from your 
 computer. The McGraw-Hill Companies, Inc. reserves the right, subject to 
 applicable local law, to monitor, review and process the content of any 
 electronic message or information sent to or from McGraw-Hill e-mail 
 addresses without informing the sender or recipient of the message. By 
 sending electronic message or information to McGraw-Hill e-mail addresses 
 you, as the sender, are consenting to McGraw-Hill processing any of your 
 personal data therein.


Solr 4.0 doesn't send qt parameter to shards

2013-01-18 Thread Mike Schultz
Can someone explain the logic of not sending the qt parameter down to the
shards?

I see from here that qt is handled as a special case for ResultGrouping:
http://lucidworks.lucidimagination.com/display/solr/Result+Grouping
where there is a special shard.qt parameter.

in 3.x solrconfig.xml supports defining a list of SearchComponents on
handler by handler basis.  This flexibility goes away if qt isn't passed
down or am I missing something?

I'm using:
requestDispatcher handleSelect=true for the legacy behavior.  We want to
be able to have a single endpoint (e.g. http://localhost:8983/solr/select)
and modify query processing by varying only the query parameters.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-0-doesn-t-send-qt-parameter-to-shards-tp4034653.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: SolrCloud :: Distributed query processing

2013-01-18 Thread Mishkin, Ernest
Thanks Yonik, that issue is exactly the same as what I observed. Glad it's 
fixed in 4.1


-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Friday, January 18, 2013 3:10 PM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud :: Distributed query processing

Hopefully the explanation here will shed some light on this:
https://issues.apache.org/jira/browse/SOLR-3912

-Yonik
http://lucidworks.com


The information contained in this message is intended only for the recipient, 
and may be a confidential attorney-client communication or may otherwise be 
privileged and confidential and protected from disclosure. If the reader of 
this message is not the intended recipient, or an employee or agent responsible 
for delivering this message to the intended recipient, please be aware that any 
dissemination or copying of this communication is strictly prohibited. If you 
have received this communication in error, please immediately notify us by 
replying to the message and deleting it from your computer. The McGraw-Hill 
Companies, Inc. reserves the right, subject to applicable local law, to 
monitor, review and process the content of any electronic message or 
information sent to or from McGraw-Hill e-mail addresses without informing the 
sender or recipient of the message. By sending electronic message or 
information to McGraw-Hill e-mail addresses you, as the sender, are consenting 
to McGraw-Hill  processing any of your personal data therein.


Re: n values in one fieldType

2013-01-18 Thread Mike Schultz
It depends on what kind of behavior you're looking for.

if for your queries the order of the 6 integer values doesn't matter you
could do:

field name=item type=tint multiValued=true/

then you could query with ORed or ANDed integer values over that field.

If the order matters but you always query on the set of 6 values, then you
turn your six integers into a guid or simply HEX encode them into
singleValued string field.

Another possibility is to HEX encode the integers and separate them with
whitespace and whitespace tokenize.  Then you get a mixture of the two above
but you can also specify some locality constraints, eg. using phrase
queries, etc.

The answer really depends on the types of queries you need to be able to
respond to.

M



--
View this message in context: 
http://lucene.472066.n3.nabble.com/n-values-in-one-fieldType-tp4034552p4034662.html
Sent from the Solr - User mailing list archive at Nabble.com.


Distributed edismax query - which handler should be in the shards parameter?

2013-01-18 Thread Shawn Heisey
I have a handler on my broker core which has defType=edismax and a 
shards parameter in solrconfig.xml.  Currently the handler referenced on 
each shard in the shards parameter is a handler that does NOT have 
defType defined.  The defType parameter is not in the URL that the 
client sends.


It just occurred to me that perhaps the shard queries should be sent to 
a handler with defType=edismax ... is that right, or is my current 
approach OK?


I've had the edismax handler defined for a long time, but it's only been 
in the last few days that it's actually seeing any use.


Thanks,
Shawn


Re: Auto completion

2013-01-18 Thread Erik Hatcher
In the default /browse suggest, it's wired together with the 'name' field in a 
couple of places:

head.vm:
  'terms.fl': 'name',

suggest.vm:
  #foreach($t in $response.response.terms.name)

I'll aim to make this more dynamic in the future, but for now if you're 
adapting from what's there now to use a different field you'll need to hack 
those two spots.  That second spot is simply a path navigation reference in the 
terms component response that puts the field name as one level in the response 
structure.

So in your example, if you wanted to suggest from 'text', substitute 'text' for 
'name' in the spots just mentioned.

Erik



On Jan 11, 2013, at 01:21 , anurag.jain wrote:

 in solrconfig.xml 
 
 
   str name=defTypeedismax/str
   str name=qf
  text^0.5 last_name^1.0 first_name^1.2 course_name^7.0 id^10.0
 branch_name^1.1 hq_passout_year^1.4
  course_type^10.0 institute_name^5.0 qualification_type^5.0
 mail^2.0 state_name^1.0
   /str
   str name=dftext/str
   str name=mm100%/str
   str name=q.alt*:*/str
   str name=rows10/str
   str name=fl*,score/str
 
   str name=mlt.qf
 text^0.5 last_name^1.0 first_name^1.2 course_name^7.0 id^10.0
 branch_name^1.1 hq_passout_year^1.4
  course_type^10.0 institute_name^5.0 qualification_type^5.0
 mail^2.0 state_name^1.0
   /str
   str
 name=mlt.fltext,last_name,first_name,course_name,id,branch_name,hq_passout_year,course_type,institute_name,qualification_type,mail,state_name/str
   int name=mlt.count3/int
 
 
   str name=faceton/str
   str name=facet.fieldis_top_institute/str
   str name=facet.fieldcourse_name/str
 
   str name=facet.rangecgpa/str
   int name=f.cgpa.facet.range.start0/int
   int name=f.cgpa.facet.range.end10/int
   int name=f.cgpa.facet.range.gap2/int
 
 
 
 
 and in schema.xml
 
 
 
   field name=id type=text_general indexed=true stored=true
 required=true multiValued=false / 
   field name=first_name type=text_general indexed=false
 stored=true/
   field name=last_name type=text_general indexed=false
 stored=true/
   field name=institute_name type=text_general indexed=true
 stored=true/
...
...
...
 
 
copyField source=first_name dest=text/
copyField source=last_name dest=text/
 copyField source=institute_name dest=text/
 ...
 ...
 ...
 
 
 so please now tell me what will be JavaScript (terms.fl parameter) ? and
 conf/velocity/head.vm, and also the 'name' reference in suggest.vm. 
 
 
 please reply .. and thanks for previous reply ..  :-)
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Auto-completion-tp4032267p4032450.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Velocity in Multicore

2013-01-18 Thread Erik Hatcher
Paul -

In case you haven't already sussed this one out already, the likely issue is 
that each core is separately configured and only the single core example 
collection1 core comes with the VelocityResponseWriter wired in fully.  You 
need these lines (paths likely need adjusting!) in your solrconfig.xml:

  lib dir=../../../contrib/velocity/lib regex=.*\.jar /
  lib dir=../../../dist/ regex=apache-solr-velocity-\d.*\.jar /

and 

queryResponseWriter name=velocity class=solr.VelocityResponseWriter 
startup=lazy/


I used the example-DIH, launching via [java 
-Dsolr.solr.home=./example-DIH/solr/ -jar start.jar), after adding the above 
to the db/conf/solrconfig.xml file and this works:

  
http://localhost:8983/solr/db/select?q=*:*wt=velocityv.template=hellov.template.hello=Hello%20World!

Before adding getting VrW registered, Solr fell back to the XML response writer 
as you experienced.

Erik




On Jan 14, 2013, at 14:05 , Ramirez, Paul M (388J) wrote:

 Hi,
 
 I've been unable to get the velocity response writer to work in a multicore 
 environment. Working from the examples that are distributed with Solr I 
 simply started from the multicore example and added a hello.vm into 
 core0/conf/velocity directory. I then updated the solrconfig.xml to add a new 
 request handler as shown below. I've tried to use the v.base_dir to no 
 success. Essentially what I always end up with is the default solr response. 
 Has anyone been able to get the velocity response writer to work in a 
 multicore environment? If so, could you point me to the documentation on how 
 to do so.
 
 hello.vm
 
 Hello World!
 
 solrconfig.xml
 ===
 …
 requestHandler name=/hello class=solr.SearchHandler
 lst name=defaults
   str name=echoParamsexplicit/str
   !-- VelocityResponseWriter settings --
   str name=wtvelocity/str
   str name=v.templatehello/str
   !-- I've tried all the following in addition to not specifying any. --
   !--str name=v.base_dircore0/conf/velocity/str--
   !--str name=v.base_dirconf/velocity/str--
   !--str name=v.base_dirmulticore/core0/conf/velocity/str--
 /lst
 
  /requestHandler
 …
 
 
 
 Regards,
 Paul Ramirez



Re: Question on Solr Velocity Example

2013-01-18 Thread Erik Hatcher

On Jan 18, 2013, at 14:47 , O. Olson wrote:
 I am new to
 Solr (and Velocity), and have downloaded Solr 4.0 from 
 http://lucene.apache.org/solr/downloads.html.
 I started the example solr, and indexed the XML files in the /exampledocs
 directory. Next, I pointed the browser to: http://localhost:8983/solr/browse 
 and I get the results along with the search and faceted search functionality. 
 I
 am interested in learning how this example works. I hope some of you can help
 me with the following questions:  
  
 1.  In
 this example, we seem to be using the Velocity templates in: 
 /example/solr/collection1/conf/velocity.
 The overall page at http://localhost:8983/solr/browse seems to be generated 
 from browse.vm - which seems to include (parse) other
 templates. My question here is that I see things like 
 $response.response.clusters
 – Where can I know what properties the “response” object has, or the 
 “clusters”
 object has?

Great question.  $response is as described here 
http://wiki.apache.org/solr/VelocityResponseWriter#Velocity_Context

You can navigate Solr's javadocs (or via IDE and the source code as I do) to 
trace what that object returns and then introspect as you drill in.

I often just add '.class' to something in a template to have it output what 
kind of Java object it is, and work from there, such as 
$response.clusters.class 

 Also there seem to be some methods like display_facet_query() –
 where is this defined. Is there some documentation for this, or some way I can
 find this out? I might need to modify these values, hence my question. (I am
 completely new to Velocity – but I think I get some idea by looking at the
 templates.)

display_facet_query is a macro defined in velocity/VM_global_library.vm, which 
is the default Velocity location to put global macros that all templates can 
see. 

 2.  In http://localhost:8983/solr/browse page, we have a list of Query 
 Facets. Right now I just see two: ipod and GB?
 How are these values obtained? Do they come from elevate.xml?? Here I see 
 ipod,
 but not GB. 

They come from the definition of the /browse handler (in other words, just 
arbitrary query request parameters but hard-coded for example purposes) as: 

   str name=facet.queryipod/str
   str name=facet.queryGB/str

The Velocity templates (facet_queries.vm in this case) dynamically generates 
the link and count display for all facet.query's in the request.

Erik




Re: Question on Solr Velocity Example

2013-01-18 Thread O. Olson




- Messaggio originale -
Da: Erik Hatcher erik.hatc...@gmail.com
A: solr-user@lucene.apache.org; O. Olson olson_...@yahoo.it
Cc: 
Inviato: Venerdì 18 Gennaio 2013 15:20
Oggetto: Re: Question on Solr Velocity Example


Great question.  $response is as described here 
http://wiki.apache.org/solr/VelocityResponseWriter#Velocity_Context

You can navigate Solr's javadocs (or via IDE and the source code as I do) to 
trace what that object returns and then introspect as you drill in.

I often just add '.class' to something in a template to have it output what 
kind of Java object it is, and work from there, such as 
$response.clusters.class 


-

Thank you Erik. On the Page 
http://wiki.apache.org/solr/VelocityResponseWriter#Velocity_Context if you 
click on QueryResponse you get a 404 i.e. a link to 
http://lucene.apache.org/solr/4_0_0/solr-core/org/apache/solr/client/solrj/response/QueryResponse.html
 is a 404. 
 
Thank you for throwing light on my other questions. Your
responses helped.
 
Thank you,
O. O.


Re: Solr 4.0 doesn't send qt parameter to shards

2013-01-18 Thread Mark Miller
Yeah, shards.qt is the way to go.

I think this basically done because of a limitation around early distrib search.

If you want to do a distrib search against a bunch of shards and you put the 
shards to search into the request handler, you then don't want to hit that same 
request handler on the subsearches or it will just keep sub searching those 
nodes over and over. So usually it was recommended you create a second search 
handler with the shards param. Distrib search is then setup so that sub 
searches call the /select handler using the params from the custom handler you 
added. shards gets removed though, and since it's not set in the /select 
handler you don't infinitely loop.

I think with something like SolrCloud this is not a problem - you don't 
hardcode shards in the request handler. It's also not a problem if you specify 
shards on your query instead.

It does make using some components awkard - it's not usually immediately clear 
how to add a distrib component - but you usually end up setting shards.qt so 
that you don't jump to the select handler. If you must hard code shards in the 
config, you can try other things like also adding the component to the /select 
handler.

I think it would be nice to clean this up a bit somehow. Or document it better.

- Mark


On Jan 18, 2013, at 3:39 PM, Shawn Heisey s...@elyograg.org wrote:

 On 1/18/2013 1:20 PM, Mike Schultz wrote:
 Can someone explain the logic of not sending the qt parameter down to the
 shards?
 
 I see from here that qt is handled as a special case for ResultGrouping:
 http://lucidworks.lucidimagination.com/display/solr/Result+Grouping
 where there is a special shard.qt parameter.
 
 in 3.x solrconfig.xml supports defining a list of SearchComponents on
 handler by handler basis.  This flexibility goes away if qt isn't passed
 down or am I missing something?
 
 I'm using:
 requestDispatcher handleSelect=true for the legacy behavior.  We want to
 be able to have a single endpoint (e.g. http://localhost:8983/solr/select)
 and modify query processing by varying only the query parameters.
 
 Just add shards.qt with the appropriate value in each solrconfig.xml handler 
 definition that needs it.  That should work for most use cases.
 
 Thanks,
 Shawn
 



Re: SolrCloud Performance for High Query Volume

2013-01-18 Thread Niran Fajemisin
Hi Otis,

Thanks for the response. 

The primary difference in the schema and solrconfig are the settings that are 
needed/required for 4.0 compatibility; so things like version field, schema 
version number, auto commit settings etc. 

A quick note on our SolrCloud topology: we have 2 Shards with one replica per 
shard. So essentially 2 servers per shard, which makes up the 4 servers that I 
referred to below. (Sorry for not being specific)

As for the comment about the RAM, given our SolrCloud setup we felt that we 
wouldn't need an equal amount of memory given the size of the shard would be 
roughly 50% of the entire document collection...at least that was our 
rationale. We might be totally off-base here.

Our index will contain about 175 million documents, with each document having 
about 65 fields. The actual physical size of the index is estimated at about 
75GB.

Almost 90-95% of the queries executed against the index are filter queries, as 
the site is based on faceted searches. Hence I'll say that the queries will be 
diverse, as it's based on various user driven permutations. 

We're going to need to work with our infrastructure team to determine the disk 
IO utilization between the 3.6 and 4.0 environments.

Hopefully that all makes sense.

Any immediate thoughts on any of this?

Thanks as usual.

-Niran 





 From: Otis Gospodnetic otis.gospodne...@gmail.com
To: solr-user@lucene.apache.org; Niran Fajemisin afa...@yahoo.com 
Sent: Thursday, January 17, 2013 10:12 AM
Subject: Re: SolrCloud Performance for High Query Volume
 

Hello Niran,


 Now with the roughly the same schema and solrconfig configuration


Can you be more specific about what was changed and how?


 * 4 Solr server instances each with 4 CPUs (each 6 cores, 2.67GHz), 8GB of 
RAM and 150GB HDD


That's less RAM than before.  Could it be that this causes more disk IO 
because the index is not as well cached?


Note that you are comparing a non-real-time master-slave setup with a 
real-time SolrCloud setup (with an unknown number of shards, replicas, etc.)


SSDs will help if there is a lot of disk IO (i.e. if indices are big, queries 
diverse, and free memory scarce).  I'd start by looking at all system-level 
indicators and metrics. SPM for Solr may help: 
http://sematext.com/spm/solr-performance-monitoring/index.html .  Maybe you 
can show us disk IO graphs for the old cluster vs. new cluster?



Otis

--

Solr  ElasticSearch Support
http://sematext.com/









On Tue, Jan 15, 2013 at 11:54 AM, Niran Fajemisin afa...@yahoo.com wrote:

Hi all,

I'm currently in the process of doing some performance testing in 
preparations for upgrading from Solr 3.6.1 to Solr 4.0. (We're badly in need 
of NRT functionality)

Our existing deployment is not a typical deployment for Solr, as we use it to 
search and facet on financial data such as accounts, positions and 
transactions records. To make matters worse, each request could potentially 
return upwards of 50,000 or more records from the index. As I said, it's not 
an ideal use case for Solr but this is the system that is in place and it 
really can't be changed at this point. With this defined use case, our 
current 3.6.1 deployment is able to scale to about 1500 queries per minute, 
with an average response time in the low 100-200ms. Note that this time 
includes the query time and the transport time (time to stream all the 
documents to the calling services). At the 50,000 document mark, we're 
getting about 1.6-2 sec. response time. The client is willing to live with 
this as these type of requests are not very frequent.

Our hardware configuration on the 3.6.1 environment is as follows:
        * 1 Master Server for indexing with 2 CPU (each 6 cores, 2.67GHz)  
4GB of RAM and 150GB HDD
        * 2 Slaves Servers for query only each with 2 of CPUs (each 6 cores, 
2.67GHz) with 12GB of RAM each and same HDD space. (mechanical drive)
Each of the servers are virtual servers in a VMWare environment. 

Now with the roughly the same schema and solrconfig configuration, the 
performance on Solr 4.0 is quite bad. Running just 500 queries per minute our 
query performance degrades to almost 2 minute response times in some cases. 
The average is about 40-50 sec. response time. Note that the index at the 
moment is only a fraction of the size of the existing environment (about 
1/8th the size). 

The hardware setup for the SolrCloud deployment is as follows:
        * 4 Solr server instances each with 4 CPUs (each 6 cores, 2.67GHz), 
8GB of RAM and 150GB HDD

        * 3 ZooKeeper server instances. We are using each Solr server 
instance to run 1 ZK instance, with the 4th server not running a ZK server.
We haven't observed any issues with memory utilization. Additionally the 
virtual servers are co-located. We're wondering if upgrading to Solid State 
Drives would improve performance significantly?

Are there any other pointers or configuration changes that we 

Re: Long ParNew GC pauses - even when young generation is small

2013-01-18 Thread Shawn Heisey

On 1/18/2013 12:40 PM, giltene wrote:

 product bragging alert 

We've got people running Solr on the Zing JVM at various places for exactly
this reason. A key side effect of running on Zing is the complete
elimination of GC effects, with no code changes or tuning needed.

So instead of wanting pauses of half a second or less, and settling for
pauses of 2 seconds or less (per your message), you can instead actually run
on a JVM with a GC behavior that drops noise to below 20 msec with Solr. And
you can get this the day you turn Zing on, without needing to know any more
about tuning, or having to make the various interplays or tradeoffs around
things like Parnew sizing, heap sizing, occupancy thresholds, and the 75
other flags that may strongly affect your user experience.


I don't see any info on your website about pricing, so I can't make any 
decisions about whether it would be right for me.  Can you give me 
long-term pricing information?


Chances are that once I inform management of the cost, it'd never fly.

Does anyone know how to get good GC pause characteristics with Solr and 
the latest Oracle Java 7?


Thanks,
Shawn



Re: Long ParNew GC pauses - even when young generation is small

2013-01-18 Thread Mark Miller

On Jan 6, 2013, at 5:41 PM, Shawn Heisey s...@elyograg.org wrote:

 Clarification of my question and my goals:
 
 What I *want* is for all GC pauses to be half a second or less.

I'd try working with the concurrent, low pause collector. Any of the stop the 
world collectors mixed with a large heap will likely mean a few second pauses 
at least at some points. A well tuned concurrent collector will never step the 
world in most situations.

-XX:+UseConcMarkSweepGC

I wrote an article that might be useful a while back: 
http://searchhub.org/2011/03/27/garbage-collection-bootcamp-1-0/

- Mark

Re: Long ParNew GC pauses - even when young generation is small

2013-01-18 Thread Shawn Heisey

On 1/18/2013 8:37 PM, Mark Miller wrote:


On Jan 6, 2013, at 5:41 PM, Shawn Heisey s...@elyograg.org wrote:


Clarification of my question and my goals:

What I *want* is for all GC pauses to be half a second or less.


I'd try working with the concurrent, low pause collector. Any of the stop the 
world collectors mixed with a large heap will likely mean a few second pauses 
at least at some points. A well tuned concurrent collector will never step the 
world in most situations.

-XX:+UseConcMarkSweepGC

I wrote an article that might be useful a while back: 
http://searchhub.org/2011/03/27/garbage-collection-bootcamp-1-0/


Mark,

I have been using that collector.  When I had a very large young 
generation (NewRatio=1), most of the really long collections were 
ParNew.  When I lowered the young generation size drastically, the 
overall situation got slightly better.  Unfortunately there are still 
long pauses, but now they are CMS.  I wrote a handy little perl script 
to parse a GC log and spit out a compact listing of every line that 
takes longer than half a second.


On my dev 4.1 server with Java 7u11, I am using the G1 collector with a 
max pause target of 1500ms.  I was thinking that this collector was 
producing long pauses too, but after reviewing the gc log with a closer 
eye, I see that there are lines that specifically say pause ... and 
all of THOSE lines are below half a second except one that took 1.4 
seconds.  Does that mean that it's actually meeting the target, or are 
the other lines that show quite long time values indicative of a 
problem?  If only the lines that explicitly say pause are the ones I 
need to worry about, then it looks like G1 is the clear winner.


My production servers are version 3.5 with Java 6u38.

After reading your bootcamp and consulting a few other guides, this was 
going to be my next step:


-Xms1024
-Xmx8192
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:NewRatio=3
-XX:MaxTenuringThreshold=8

I may try the G1 collector with Java 6 in production, since I am on the 
newest Oracle version.  I am very interested in knowing whether I need 
to worry about G1 log entries that don't say pause in them.  Below is 
an excerpt from a G1 log.  Notice how only a few of the lines actually 
say pause ... in addition to these here that say (young) there are 
some pause lines that say (mixed):


http://fpaste.org/U0aQ/

Thanks,
Shawn