solrinitialisationerrors: Error during shutdown of writer.

2013-12-04 Thread Nutan
I dont why all of a sudden I started getting this errror :
this is the sreenshot:
http://lucene.472066.n3.nabble.com/file/n4104874/Untitled.png 

I thought there might be some problem with tomcat,so I uninstalled it ,but i
still get the same error.
I have no idea why is this happening,initially it worked really well.
In tomcat java-options home var is : *-Dsolr.solr.home=C:\solr*
I am using the initial solr.xml only,I have  created two cores n folder
structure is as desired.
My folder structure is:
1)C:\solr\contract\conf
2)C:\solr\document\conf
3)C:\solr\lib
These are my config files:
*solr.xml*
?xml version=1.0 encoding=UTF-8 ?
solr persistent=true sharedLib=lib
  cores adminPath=/admin/cores zkClientTimeout=${zkClientTimeout:15000}
hostPort=8080 hostContext=solr
core loadOnStartup=true instanceDir=document\ transient=false
name=document/
core loadOnStartup=true instanceDir=contract\ transient=false
name=contract/
  /cores
/solr

This i got after i re-installed tomcat:

INFO: closing IndexWriter with IndexWriterCloser
Dec 04, 2013 2:09:30 PM org.apache.solr.update.DefaultSolrCoreState
closeIndexWriter
*SEVERE: Error during shutdown of writer.*
java.lang.NoClassDefFoundError:
org/apache/solr/request/LocalSolrQueryRequest
at
org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:682)
at
org.apache.solr.update.DefaultSolrCoreState.closeIndexWriter(DefaultSolrCoreState.java:69)
at
org.apache.solr.update.DefaultSolrCoreState.close(DefaultSolrCoreState.java:278)
at
org.apache.solr.update.SolrCoreState.decrefSolrCoreState(SolrCoreState.java:73)
at org.apache.solr.core.SolrCore.close(SolrCore.java:972)
at org.apache.solr.core.CoreContainer.shutdown(CoreContainer.java:771)
at
org.apache.solr.servlet.SolrDispatchFilter.destroy(SolrDispatchFilter.java:134)
at
org.apache.catalina.core.ApplicationFilterConfig.release(ApplicationFilterConfig.java:311)
at
org.apache.catalina.core.StandardContext.filterStop(StandardContext.java:4660)
at
org.apache.catalina.core.StandardContext.stopInternal(StandardContext.java:5442)
at org.apache.catalina.util.LifecycleBase.stop(LifecycleBase.java:232)
at
org.apache.catalina.core.ContainerBase.removeChild(ContainerBase.java:1001)
at
org.apache.catalina.startup.HostConfig.checkResources(HostConfig.java:1272)
at org.apache.catalina.startup.HostConfig.check(HostConfig.java:1450)
at
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:295)
at
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
at
org.apache.catalina.util.LifecycleBase.fireLifecycleEvent(LifecycleBase.java:90)
at
org.apache.catalina.core.ContainerBase.backgroundProcess(ContainerBase.java:1338)
at
org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1496)
at
org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1506)
at
org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run(ContainerBase.java:1485)
at java.lang.Thread.run(Unknown Source)

Please help me, after implementing so much this error has screwed me up.
*Thanks in advance.*



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solrinitialisationerrors-Error-during-shutdown-of-writer-tp4104874.html
Sent from the Solr - User mailing list archive at Nabble.com.


Faceting Query in Solr

2013-12-04 Thread kumar
Hi,

I indexed data into solr by using 5 categories. Each category is
differentiated by categoryId. Now i have a situation that i need to show the
results based on facets.

Ex:

[]-category1
[]-category2
[]-category3
[]-category4
[]-category5


If the user checks the category1 it has to show the results based on
categoryId-1

If the user checks 2 categories it has to show the results from two
categories which the user checked

If the user checks 3 categories it has to show the results from three
categories

and son on.like how many categories user checked i have to show results
from checked categories

My Schema is in the following way..

field name=id type=string indexed=true stored=true required=true
multiValued=false /
field name=categoryId type=int indexed=true stored=false
required=true /
field name=url type=string indexed=true stored=true required=true
/
field name=content type=string indexed=false stored=true
multiValued=true required=true /


Anyone help me how can i achieve this.

Regards,
Kumar





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Faceting-Query-in-Solr-tp4104881.html
Sent from the Solr - User mailing list archive at Nabble.com.


how to increase each index file size

2013-12-04 Thread YouPeng Yang
Hi
  I'm using the SolrCloud integreted with HDFS,I found there are lots of
small size files.
  So,I'd like to increase  the index  file size  while doing DIH
full-import. Any suggestion to achieve this goal.


Regards.


Shouldn't fuzzy version of a solr query always return a super set of its not-fuzzy equivalent

2013-12-04 Thread Mhd Wrk
I'm using the following query to do a fuzzy search on Solr 4.5.1 and am
getting empty result.

qt=standardq=+(field1|en_CA|:Swimming~2 field1|en|:Swimming~2)
+(field1|en_CA|:Goggle~1 field1|en|:Goggle~1) +(+startDate:[* TO
2013-12-04T00:23:00Z] -endDate:[* TO
2013-12-04T00:23:00Z])start=0rows=10fl=id

If I change it to a not fuzzy query by simply dropping tildes from the
terms (see below) then it returns the expected result! Is this a bug?
Shouldn't fuzzy version of a query always return a super set of its
not-fuzzy equivalent?

qt=standardq=+(field1|en_CA|:Swimming field1|en|:Swimming)
+(field1|en_CA|:Goggle field1|en|:Goggle) +(+startDate:[* TO
2013-12-04T00:23:00Z] -endDate:[* TO
2013-12-04T00:23:00Z])start=0rows=10fl=id


Solr Suggester ranked by boost

2013-12-04 Thread Mirko
I want to implement a Solr Suggester (http://wiki.apache.org/solr/Suggester)
that ranks suggestions by document boost factor.

As I understand the documentation, the following config should work:

Solrconfig.xml:

...
requestHandler name=/suggest class=solr.SearchHandler
lst name=defaults
str name=spellchecktrue/str
str name=spellcheck.count7/str
str name=spellcheck.onlyMorePopulartrue/str
/lst
arr name=last-components
strsuggest/str
/arr
/requestHandler

searchComponent class=solr.SpellCheckComponent name=suggest
lst name=spellchecker
str name=namedefault/str
str name=fieldsuggesttext/str
str
name=classnameorg.apache.solr.spelling.suggest.Suggester/str
str
name=lookupImplorg.apache.solr.spelling.suggest.fst.WFSTLookupFactory/str
str name=buildOnCommittrue/str
/lst
/searchComponent
...

Schema.xml:

...
field name=suggesttext type=text indexed=true  stored=true
 multiValued=true/
...
fieldType name=text class=solr.TextField omitNorms=false/
...

I added three documents with a document boost:

{

add: {
  commitWithin: 5000,
  overwrite: true,
  boost: 3.0,
  doc: {
id: 1,
suggesttext: text bb
  }
},
add: {
  commitWithin: 5000,
  overwrite: true,
  boost: 2.0,
  doc: {
id: 2,
suggesttext: text cc
  }
},
add: {
  commitWithin: 5000,
  overwrite: true,
  boost: 1.0,
  doc: {
id: 3,
suggesttext: text aa
  }
}

}

A query the suggest handler (with spellcheck.q=te) gives the following
response:

{
  responseHeader:{
status:0,
QTime:6},
  command:build,
  response:{numFound:3,start:0,docs:[
  {
id:1,
suggesttext:[text bb]},
  {
id:2,
suggesttext:[text cc]},
  {
id:3,
suggesttext:[text aa]}]
  },
  spellcheck:{
suggestions:[
  te,{
numFound:3,
startOffset:0,
endOffset:2,
suggestion:[text aa,
  text bb,
  text cc]}]}}

The search results are ranked by boost as expected. However, the
suggestions are not ranked by boost (but alphabetically instead). I also
tried the TSTLookup and FSTLookup lookup implementations with the same
result.

Any ideas what I'm missing?

Thanks,
Mirko


Re: Automatically build spellcheck dictionary on replicas

2013-12-04 Thread Mirko
Ok, thanks for pointing that out!


2013/12/3 Kydryavtsev Andrey werde...@yandex.ru

 Yep, sorry, it doesn't work for file-based dictionaries:

  In particular, you still need to index the dictionary file once by
 issuing a search with spellcheck.build=true on the end of the URL; if you
 system doesn't update that dictionary file, then this only needs to be done
 once. This manual step may be required even if your configuration sets
 build=true and reload=true.

 http://wiki.apache.org/solr/FileBasedSpellChecker

 03.12.2013, 21:27, Mirko idonthaveenoughinformat...@googlemail.com:
  Yes, I have that, but it doesn't help. It seems Solr still needs the
 query
  with the spellcheck.build parameter to build the spellchecker index.
 
  2013/12/3 Kydryavtsev Andrey werde...@yandex.ru
 
   Did you try to add
 str name=buildOnCommittrue/str
parameter to your slave's spellcheck configuration?
 
   03.12.2013, 12:04, Mirko idonthaveenoughinformat...@googlemail.com
 :
   Hi all,
   We use a Solr SpellcheckComponent with a file-based dictionary. We
 run a
   master and some replica slave servers. To update the dictionary, we
 copy
   the dictionary txt file to the master, from where it is automatically
   replicated to all slaves. However, it seems we need to run the
   spellcheck.build query on all servers individually.
 
   Is there a way to automatically build the spellcheck dictionary on all
   servers without calling spellcheck.build on all slaves individually?
 
   We use Solr 4.0.0
 
   Thanks,
   Mirko



RE: SolrCloud FunctionQuery inconsistency

2013-12-04 Thread sling
Hi Raju,
Collection is a concept in solrcloud, and core is in standalone mode.
So you can create multiple cores in solr standalone mode, not collections.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-FunctionQuery-inconsistency-tp4104346p4104888.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: post filtering for boolean filter queries

2013-12-04 Thread Dmitry Kan
Thanks Yonik.

For our use case, we would like to skip caching only one particular filter
cache, yet apply a high cost for it to make sure it executes last of all
filter queries.

So this means, the rest of the fqs will execute and cache as usual.




On Tue, Dec 3, 2013 at 9:58 PM, Yonik Seeley yo...@heliosearch.com wrote:

 On Tue, Dec 3, 2013 at 4:45 AM, Dmitry Kan solrexp...@gmail.com wrote:
  ok, we were able to confirm the behavior regarding not caching the filter
  query. It works as expected. It does not cache with {!cache=false}.
 
  We are still looking into clarifying the cost assignment: i.e. whether it
  works as expected for long boolean filter queries.

 Yes, filters should be ordered by cost (cheapest first) whenever you
 use {!cache=false}

 -Yonik
 http://heliosearch.com -- making solr shine




-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: twitter.com/dmitrykan


Re: Shouldn't fuzzy version of a solr query always return a super set of its not-fuzzy equivalent

2013-12-04 Thread Erik Hatcher
Chances are you're not getting those fuzzy terms analyzed as you'd like.  See 
debug (debug=true) output to be sure.  Most likely the fuzzy terms are not 
being lowercased.  See http://wiki.apache.org/solr/MultitermQueryAnalysis for 
more details (this applies to fuzzy, not just wildcard) terms too.

Erik


On Dec 4, 2013, at 4:46 AM, Mhd Wrk mhd...@gmail.com wrote:

 I'm using the following query to do a fuzzy search on Solr 4.5.1 and am
 getting empty result.
 
 qt=standardq=+(field1|en_CA|:Swimming~2 field1|en|:Swimming~2)
 +(field1|en_CA|:Goggle~1 field1|en|:Goggle~1) +(+startDate:[* TO
 2013-12-04T00:23:00Z] -endDate:[* TO
 2013-12-04T00:23:00Z])start=0rows=10fl=id
 
 If I change it to a not fuzzy query by simply dropping tildes from the
 terms (see below) then it returns the expected result! Is this a bug?
 Shouldn't fuzzy version of a query always return a super set of its
 not-fuzzy equivalent?
 
 qt=standardq=+(field1|en_CA|:Swimming field1|en|:Swimming)
 +(field1|en_CA|:Goggle field1|en|:Goggle) +(+startDate:[* TO
 2013-12-04T00:23:00Z] -endDate:[* TO
 2013-12-04T00:23:00Z])start=0rows=10fl=id



Re: What type of Solr plugin do I need for filtering results?

2013-12-04 Thread Thomas Seidl
Thanks a lot for both of your answers! The QParserPlugin is probably 
what I meant, but join queries also look interesting and like the could 
maybe solve my use case, too, without any custom code.
However, since this would make it impossible (I think) to have a score 
for the results but I do want to do fulltext searches on the returned 
field set (with score) it will probably not be enough.


Anyways, I'll look into both of your suggestions. Thanks a lot again!

On 2013-12-02 05:39, Ahmet Arslan wrote:

It depends on your use case. What is you custom criteria how is stored etc.


For example  I had two tables, lets say items and permissions tables. 
Permissions table was holding itemId,userId pairs. Meaning userId can see this 
itemId. My initial effort was index items and add a multivalued field named 
WhoCanSeeMe. And fiterQuery on that field using current user.

After sometime indexing become troublesome. Indexing was slowing down. I 
switched to two cores for each table and used query time join. (JoinQParser) as 
a fq. I didnt have anly plugin for the above.

By the way here is an example of post filter Joel advises : 
http://searchhub.org/2012/02/22/custom-security-filtering-in-solr/






On Monday, December 2, 2013 5:14 AM, Joel Bernstein joels...@gmail.com wrote:

What you're looking for is a QParserPlugin. Here is an example:

http://svn.apache.org/viewvc/lucene/dev/tags/lucene_solr_4_6_0/solr/core/src/java/org/apache/solr/search/FunctionRangeQParserPlugin.java?revision=1544545view=markup

You're probably want to implement the QParserPlugin as PostFilter.




On Sun, Dec 1, 2013 at 3:46 PM, Thomas Seidl re...@gmx.net wrote:


Hi,

I'm currently looking at writing my first Solr plugin, but I could not
really find any overview information about how a Solr request works
internally, what the control flow is and what kind of plugins are available
to customize this at which point. The Solr wiki page on plugins [1], in my
opinion, already assumes too much knowledge and is too terse in its
descriptions.

[1] http://wiki.apache.org/solr/SolrPlugins

If anyone knows of any good ressources to get me started, that would be
awesome!

However, also pretty helpful would be just to know what kind of plugin I
should create for my use case, as I could then at least try to find
information specific to that. What I want to do is filter the search
results (at the time fq filters are applied, so before sorting, facetting,
range selection, etc. takes place) by some custom criterion (passed in the
URL). The plan is to add the data needed for that custom filter as a
separate set of documents to Solr and look them up from the Solr index when
filtering the query. Basically the thing discussed in [2], at 29:07.

[2] http://www.youtube.com/watch?v=kJa-3PEc90gfeature=youtu.bet=29m7s

So, the question is, what kind of plugin would I use (and how would it
have to be configured)? I first thought it'd have to be a SearchComponent,
but I think with that I'd only get the results after they are sorted and
trimmed to the range, right?

Thanks a lot in advance,
Thomas Seidl







Solr cuts highlighted sentences

2013-12-04 Thread katoo
Hi guys,

when searching for a phrase I get results and would like to show a
highlighting.
The highlightings beeing shown begin somewhere in the sentence, beginning
with a coma or something else.
I'd like to get highlightings beginning with a stence.
How to manage this.
I've tried so many things found in internet, but nothing helped.
Example:

query.setHighlight(true).setParam(hl.useFastVectorHighlighte, true);
query.setHighlight(true).setParam(hl.fragsize, 500);
query.setHighlight(true).setParam(hl.fragmenter, regex);
query.setHighlight(true).setParam(hl.regex.slop, 0.8);
query.setHighlight(true).setParam(hl.regex.pattern,
[\\w][^.!?]{400,600}[.!?]); //\w[^\.!\?]{400,600}[\.!\?]
query.setHighlight(true).setParam(hl.bs.type, SENTENCE);

etc etc..
Whats wrong about this?
Thx





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-cuts-highlighted-sentences-tp4104894.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Using the flexible query parser in Solr instead of classic

2013-12-04 Thread Karsten R.
Hi Jack Kurpansky, Hi folks

We could recreate the edismax QueryParser from classic to flexible.
But is this a need for someone else? 

In long text:

ExtendedDismaxQParser uses ExtendedSolrQueryParser.
ExtendedSolrQueryParser is derived from SolrQueryParser.
So it is based on org.apache.solr.parser.QueryParser.jj which is a slightly
change of org.apache.lucene.queryparser.classic.QueryParser.jj 

If SolrQueryParser switches to lucene flexible QueryParser the
ExtendedSolrQueryParser will be a good example how to generate Subclasses
without the classic logic of overwriting the methodes getFuzzyQuery,
getPrefixQuery, getWildcardQuery ...
(and using instead subclasses of FuzzyQueryNodeProcessor,
WildcardQueryNodeProcessor ..).

So again:
Is this a need for someone else?


Best regards
  Karsten



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-the-flexible-query-parser-in-Solr-instead-of-classic-tp4104584p4104895.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: json update moves doc to end

2013-12-04 Thread Andreas Owen
Hi Erick

Here are the last 2 results from a search and i am not understanding why the
last one with the boost editorschoice^200 isn't at the top. By the way can i
also give a substantial boost to results that contain the hole
search-request and not just 3 or 4 letters (tokens)?

str name=dms:1003
-Infinity = (MATCH) sum of:
  0.013719446 = (MATCH) max of:
0.013719446 = (MATCH) sum of:
  2.090396E-4 = (MATCH) weight(plain_text:ber in 841)
[DefaultSimilarity], result of:
2.090396E-4 = score(doc=841,freq=8.0 = termFreq=8.0
), product of:
  0.009452709 = queryWeight, product of:
1.3343692 = idf(docFreq=611, maxDocs=855)
0.0070840283 = queryNorm
  0.022114253 = fieldWeight in 841, product of:
2.828427 = tf(freq=8.0), with freq of:
  8.0 = termFreq=8.0
1.3343692 = idf(docFreq=611, maxDocs=855)
0.005859375 = fieldNorm(doc=841)
  0.0012402858 = (MATCH) weight(plain_text:eri in 841)
[DefaultSimilarity], result of:
0.0012402858 = score(doc=841,freq=9.0 = termFreq=9.0
), product of:
  0.022357063 = queryWeight, product of:
3.1559815 = idf(docFreq=98, maxDocs=855)
0.0070840283 = queryNorm
  0.05547624 = fieldWeight in 841, product of:
3.0 = tf(freq=9.0), with freq of:
  9.0 = termFreq=9.0
3.1559815 = idf(docFreq=98, maxDocs=855)
0.005859375 = fieldNorm(doc=841)
  5.0511415E-4 = (MATCH) weight(plain_text:ric in 841)
[DefaultSimilarity], result of:
5.0511415E-4 = score(doc=841,freq=1.0 = termFreq=1.0
), product of:
  0.024712078 = queryWeight, product of:
3.4884217 = idf(docFreq=70, maxDocs=855)
0.0070840283 = queryNorm
  0.020439971 = fieldWeight in 841, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
3.4884217 = idf(docFreq=70, maxDocs=855)
0.005859375 = fieldNorm(doc=841)
  8.721528E-4 = (MATCH) weight(plain_text:ich in 841)
[DefaultSimilarity], result of:
8.721528E-4 = score(doc=841,freq=12.0 = termFreq=12.0
), product of:
  0.017446788 = queryWeight, product of:
2.4628344 = idf(docFreq=197, maxDocs=855)
0.0070840283 = queryNorm
  0.049989305 = fieldWeight in 841, product of:
3.4641016 = tf(freq=12.0), with freq of:
  12.0 = termFreq=12.0
2.4628344 = idf(docFreq=197, maxDocs=855)
0.005859375 = fieldNorm(doc=841)
  7.725705E-4 = (MATCH) weight(plain_text:cht in 841)
[DefaultSimilarity], result of:
7.725705E-4 = score(doc=841,freq=4.0 = termFreq=4.0
), product of:
  0.021610687 = queryWeight, product of:
3.050621 = idf(docFreq=109, maxDocs=855)
0.0070840283 = queryNorm
  0.035749465 = fieldWeight in 841, product of:
2.0 = tf(freq=4.0), with freq of:
  4.0 = termFreq=4.0
3.050621 = idf(docFreq=109, maxDocs=855)
0.005859375 = fieldNorm(doc=841)
  0.0010287998 = (MATCH) weight(plain_text:beri in 841)
[DefaultSimilarity], result of:
0.0010287998 = score(doc=841,freq=1.0 = termFreq=1.0
), product of:
  0.035267927 = queryWeight, product of:
4.978513 = idf(docFreq=15, maxDocs=855)
0.0070840283 = queryNorm
  0.029170973 = fieldWeight in 841, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
4.978513 = idf(docFreq=15, maxDocs=855)
0.005859375 = fieldNorm(doc=841)
  0.0010556461 = (MATCH) weight(plain_text:eric in 841)
[DefaultSimilarity], result of:
0.0010556461 = score(doc=841,freq=1.0 = termFreq=1.0
), product of:
  0.035725117 = queryWeight, product of:
5.0430512 = idf(docFreq=14, maxDocs=855)
0.0070840283 = queryNorm
  0.02954913 = fieldWeight in 841, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
5.0430512 = idf(docFreq=14, maxDocs=855)
0.005859375 = fieldNorm(doc=841)
  5.653785E-4 = (MATCH) weight(plain_text:rich in 841)
[DefaultSimilarity], result of:
5.653785E-4 = score(doc=841,freq=1.0 = termFreq=1.0
), product of:
  0.02614473 = queryWeight, product of:
3.6906586 = idf(docFreq=57, maxDocs=855)
0.0070840283 = queryNorm
  0.021624953 = fieldWeight in 841, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
3.6906586 = idf(docFreq=57, maxDocs=855)
0.005859375 = fieldNorm(doc=841)
  0.0010596104 = (MATCH) weight(plain_text:icht in 841)
[DefaultSimilarity], result of:
0.0010596104 = score(doc=841,freq=3.0 = termFreq=3.0
), product of:
  0.027196141 = queryWeight, product of:
3.8390784 = idf(docFreq=49, maxDocs=855)

Solr Doubts

2013-12-04 Thread Jiyas Basha H
Hai Team, 

I am new to Solr. 
I am trying to index 7GB CSV file. 

My questions: 
1.How to index without using uniquekey ? 

I tried with uniquekey required = false id/uniquekey 

I got -- Document is missing mandatory uniqueKey field: id 

I am using query to update csv : 
localhost:9050/solr-4.5.1/collection1/update/csv?stream.file=D:\Solr\comma15_Id.csvcommit=trueheader=falsefieldnames=ORD,ORC,SBN,BNA,POB,NUM,DST,STM,DDL,DLO,PTN,PCD,CTA,CTP,CTT
 

2. how to increase jvm heap space in solr ? 
since my file is too large i am getting java heap space error 

I am not interested to split my large file into batches.however i need to 
complete indexing with 7GB CSV file. 

please assist me to index my csv file 





with regards 
Jiyas 

Problems are only opportunities with thorns on them. 


Fwd: [Solr Wiki] Your wiki account data

2013-12-04 Thread Mehdi Burgy
Hello,

We've recently launched a job search engine using Solr, and would like to
add it here: https://wiki.apache.org/solr/PublicServers

Would it be possible to allow me be part of the publishing group?

Thank you for your help

Kind Regards,

Mehdi Burgy
New Job Search Engine:
www.jobreez.com

-- Forwarded message --
From: Apache Wiki wikidi...@apache.org
Date: 2013/12/4
Subject: [Solr Wiki] Your wiki account data
To: Apache Wiki wikidi...@apache.org



Somebody has requested to email you a password recovery token.

If you lost your password, please go to the password reset URL below or
go to the password recovery page again and enter your username and the
recovery token.

Login Name: madeinch


Re: [Solr Wiki] Your wiki account data

2013-12-04 Thread Erick Erickson
Sure. Unfortunately we had a problem a while
ago with spam bots creating pages so had
to lock it down.

Done, you should be able to edit the Solr Wiki.

Erick


On Wed, Dec 4, 2013 at 8:06 AM, Mehdi Burgy gla...@gmail.com wrote:

 Hello,

 We've recently launched a job search engine using Solr, and would like to
 add it here: https://wiki.apache.org/solr/PublicServers

 Would it be possible to allow me be part of the publishing group?

 Thank you for your help

 Kind Regards,

 Mehdi Burgy
 New Job Search Engine:
 www.jobreez.com

 -- Forwarded message --
 From: Apache Wiki wikidi...@apache.org
 Date: 2013/12/4
 Subject: [Solr Wiki] Your wiki account data
 To: Apache Wiki wikidi...@apache.org



 Somebody has requested to email you a password recovery token.

 If you lost your password, please go to the password reset URL below or
 go to the password recovery page again and enter your username and the
 recovery token.

 Login Name: madeinch



Re: Deleting and committing inside a SearchComponent

2013-12-04 Thread Erick Erickson
I agree with Upayavira. This seems architecturally
questionable.

In your example, the crux of the matter is
Only differ by one field. Figuring that out is going to
be expensive, are you burdening searches with this
kind of logic?

Why not create a custom update processor that does
this and use such a component? Or build it into
your updates when you ingest the docs? Or build
a signature field and issue a delete by query on that
when you update?

Best,
Erick


On Tue, Dec 3, 2013 at 9:48 PM, Peyman Faratin peymanfara...@gmail.comwrote:


 On Dec 3, 2013, at 8:41 PM, Upayavira u...@odoko.co.uk wrote:

 
 
  On Tue, Dec 3, 2013, at 03:22 PM, Peyman Faratin wrote:
  Hi
 
  Is it possible to delete and commit updates to an index inside a custom
  SearchComponent? I know I can do it with solrj but due to several
  business logic requirements I need to build the logic inside the search
  component.  I am using SOLR 4.5.0.
 
  That just doesn't make sense. Search components are read only.
 
 i can think of many situations that it makes sense. for instance, you
 search for a document and your index contains many duplicates that only
 differ by one field, such as the time they were indexed (think news feeds
 from multiple sources). So after the search we want to delete the duplicate
 documents that satisfy some policy (here date, but it could be some other
 policy).

  What are you trying to do? What stuff do you need to change? Could you
  do it within an UpdateProcessor?

 Solution i am working with

 UpdateRequestProcessorChain processorChain =
 rb.req.getCore().getUpdateProcessingChain(rb.req.getParams().get(UpdateParams.UPDATE_CHAIN));
 UpdateRequestProcessor processor = processorChain.createProcessor(rb.req,
 rb.rsp);
 ...
 docId = f();
 ...
 DeleteUpdateCommand cmd = new DeleteUpdateCommand(req);
 cmd.setId(docId.toString());
 processor.processDelete(cmd);


 
  Upayavira




Re: solrinitialisationerrors: Error during shutdown of writer.

2013-12-04 Thread Erick Erickson
The crux is: java.lang.NoClassDefFoundError:

Usually this means your classpath is wrong and
the JVM can't find the jars. Or you have multiple
jars from different versions in your classpath.

It's pretty tedious to track down, but that's where I'd
start.

In your log, you'll see a bunch of lines like this:
2794 [coreLoadExecutor-3-thread-1] INFO
 org.apache.solr.core.SolrResourceLoader  – Adding
'file:/Users/Erick/apache/4x/solr/contrib/clustering/lib/jackson-mapper-asl-1.7.4.jar'
to classloader

showing you exactly where Solr is trying to load jars from,
that'll help.

Best,
Erick


On Wed, Dec 4, 2013 at 4:08 AM, Nutan nutanshinde1...@gmail.com wrote:

 I dont why all of a sudden I started getting this errror :
 this is the sreenshot:
 http://lucene.472066.n3.nabble.com/file/n4104874/Untitled.png

 I thought there might be some problem with tomcat,so I uninstalled it ,but
 i
 still get the same error.
 I have no idea why is this happening,initially it worked really well.
 In tomcat java-options home var is : *-Dsolr.solr.home=C:\solr*
 I am using the initial solr.xml only,I have  created two cores n folder
 structure is as desired.
 My folder structure is:
 1)C:\solr\contract\conf
 2)C:\solr\document\conf
 3)C:\solr\lib
 These are my config files:
 *solr.xml*
 ?xml version=1.0 encoding=UTF-8 ?
 solr persistent=true sharedLib=lib
   cores adminPath=/admin/cores
 zkClientTimeout=${zkClientTimeout:15000}
 hostPort=8080 hostContext=solr
 core loadOnStartup=true instanceDir=document\ transient=false
 name=document/
 core loadOnStartup=true instanceDir=contract\ transient=false
 name=contract/
   /cores
 /solr

 This i got after i re-installed tomcat:

 INFO: closing IndexWriter with IndexWriterCloser
 Dec 04, 2013 2:09:30 PM org.apache.solr.update.DefaultSolrCoreState
 closeIndexWriter
 *SEVERE: Error during shutdown of writer.*
 java.lang.NoClassDefFoundError:
 org/apache/solr/request/LocalSolrQueryRequest
 at

 org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:682)
 at

 org.apache.solr.update.DefaultSolrCoreState.closeIndexWriter(DefaultSolrCoreState.java:69)
 at

 org.apache.solr.update.DefaultSolrCoreState.close(DefaultSolrCoreState.java:278)
 at

 org.apache.solr.update.SolrCoreState.decrefSolrCoreState(SolrCoreState.java:73)
 at org.apache.solr.core.SolrCore.close(SolrCore.java:972)
 at
 org.apache.solr.core.CoreContainer.shutdown(CoreContainer.java:771)
 at

 org.apache.solr.servlet.SolrDispatchFilter.destroy(SolrDispatchFilter.java:134)
 at

 org.apache.catalina.core.ApplicationFilterConfig.release(ApplicationFilterConfig.java:311)
 at

 org.apache.catalina.core.StandardContext.filterStop(StandardContext.java:4660)
 at

 org.apache.catalina.core.StandardContext.stopInternal(StandardContext.java:5442)
 at
 org.apache.catalina.util.LifecycleBase.stop(LifecycleBase.java:232)
 at
 org.apache.catalina.core.ContainerBase.removeChild(ContainerBase.java:1001)
 at
 org.apache.catalina.startup.HostConfig.checkResources(HostConfig.java:1272)
 at
 org.apache.catalina.startup.HostConfig.check(HostConfig.java:1450)
 at
 org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:295)
 at

 org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
 at

 org.apache.catalina.util.LifecycleBase.fireLifecycleEvent(LifecycleBase.java:90)
 at

 org.apache.catalina.core.ContainerBase.backgroundProcess(ContainerBase.java:1338)
 at

 org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1496)
 at

 org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1506)
 at

 org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run(ContainerBase.java:1485)
 at java.lang.Thread.run(Unknown Source)

 Please help me, after implementing so much this error has screwed me up.
 *Thanks in advance.*



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/solrinitialisationerrors-Error-during-shutdown-of-writer-tp4104874.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Faceting Query in Solr

2013-12-04 Thread Erick Erickson
The standard way of handling this kind of thing is with
filter queries. For multi-select, you have to put in some
javascript or something to make an OR clause when they
check the boxes.

So your query looks like fq=categoryID:(1 OR 2 OR 3)
rather than
fq=categoryID:1fq=categoryID:2fq=categoryID:3

Best,
Erick


On Wed, Dec 4, 2013 at 4:36 AM, kumar pavan2...@gmail.com wrote:

 Hi,

 I indexed data into solr by using 5 categories. Each category is
 differentiated by categoryId. Now i have a situation that i need to show
 the
 results based on facets.

 Ex:

 []-category1
 []-category2
 []-category3
 []-category4
 []-category5


 If the user checks the category1 it has to show the results based on
 categoryId-1

 If the user checks 2 categories it has to show the results from two
 categories which the user checked

 If the user checks 3 categories it has to show the results from three
 categories

 and son on.like how many categories user checked i have to show results
 from checked categories

 My Schema is in the following way..

 field name=id type=string indexed=true stored=true required=true
 multiValued=false /
 field name=categoryId type=int indexed=true stored=false
 required=true /
 field name=url type=string indexed=true stored=true
 required=true
 /
 field name=content type=string indexed=false stored=true
 multiValued=true required=true /


 Anyone help me how can i achieve this.

 Regards,
 Kumar





 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Faceting-Query-in-Solr-tp4104881.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Solr Performance Issue

2013-12-04 Thread kumar
I am having almost 5 to 6 crores of indexed documents in solr. And when i am
going to change anything in the configuration file solr server is going
down.

As a new user to solr i can't able to find the exact reason for going server
down.

I am using cache's in the following way :

filterCache class=solr.FastLRUCache
 size=16384
 initialSize=4096
 autowarmCount=4096/
 queryResultCache class=solr.FastLRUCache
 size=16384
 initialSize=4096
 autowarmCount=1024/

and i am not using any documentCache, fieldValueCahe's

Whether this can lead any performance issue means going server down.

And i am seeing logging in the server it is showing exception in the
following way


Servlet.service() for servlet [default] in context with path [/solr] threw
exception [java.lang.IllegalStateException: Cannot call sendError() after
the response has been committed] with root cause



Can anybody help me how can i solve this problem.

Kumar.









--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Performance-Issue-tp4104907.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: post filtering for boolean filter queries

2013-12-04 Thread Erick Erickson
OK, so cache=false and cost=100 should do it, see:
http://searchhub.org/2012/02/22/custom-security-filtering-in-solr/

Best,
Erick


On Wed, Dec 4, 2013 at 5:56 AM, Dmitry Kan solrexp...@gmail.com wrote:

 Thanks Yonik.

 For our use case, we would like to skip caching only one particular filter
 cache, yet apply a high cost for it to make sure it executes last of all
 filter queries.

 So this means, the rest of the fqs will execute and cache as usual.




 On Tue, Dec 3, 2013 at 9:58 PM, Yonik Seeley yo...@heliosearch.com
 wrote:

  On Tue, Dec 3, 2013 at 4:45 AM, Dmitry Kan solrexp...@gmail.com wrote:
   ok, we were able to confirm the behavior regarding not caching the
 filter
   query. It works as expected. It does not cache with {!cache=false}.
  
   We are still looking into clarifying the cost assignment: i.e. whether
 it
   works as expected for long boolean filter queries.
 
  Yes, filters should be ordered by cost (cheapest first) whenever you
  use {!cache=false}
 
  -Yonik
  http://heliosearch.com -- making solr shine
 



 --
 Dmitry
 Blog: http://dmitrykan.blogspot.com
 Twitter: twitter.com/dmitrykan



Re: how to increase each index file size

2013-12-04 Thread Erick Erickson
Why do you want to do this? Are you seeing performance problems?
If not, I'd just ignore this problem, premature optimization and all that.

If you _really_ want to do this, your segments files are closed every
time you to a commit, opensearcher=true|false doesn't matter.

BUT, the longer these are the bigger your transaction log will be,
which may lead to other issues, particularly on restart. See:
http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

The key is the section on truncating the tlog.

And note the sizes of these segments will change as they're
merged anyway.

Best,
Erick


On Wed, Dec 4, 2013 at 4:42 AM, YouPeng Yang yypvsxf19870...@gmail.comwrote:

 Hi
   I'm using the SolrCloud integreted with HDFS,I found there are lots of
 small size files.
   So,I'd like to increase  the index  file size  while doing DIH
 full-import. Any suggestion to achieve this goal.


 Regards.



Re: Solr Performance Issue

2013-12-04 Thread Erick Erickson
You need to give us more of the exception trace,
the real cause is often buried down the stack with
some text like
Caused by...

But at a glance your cache sizes and autowarm counts
are far higher than they should be. Try reducing
particularly the autowarm count down to, say, 16 or so.
It's actually rare that you really need very many.

I'd actually go back to the defaults to start with to test
whether this is the problem.

Further, we need to know exactly what you mean by
change anything in the configuration file. Change
what? Details matter.

Of course the last thing you changed before you started
seeing this problem is the most likely culprit.

Best,
Erick


On Wed, Dec 4, 2013 at 8:31 AM, kumar pavan2...@gmail.com wrote:

 I am having almost 5 to 6 crores of indexed documents in solr. And when i
 am
 going to change anything in the configuration file solr server is going
 down.

 As a new user to solr i can't able to find the exact reason for going
 server
 down.

 I am using cache's in the following way :

 filterCache class=solr.FastLRUCache
  size=16384
  initialSize=4096
  autowarmCount=4096/
  queryResultCache class=solr.FastLRUCache
  size=16384
  initialSize=4096
  autowarmCount=1024/

 and i am not using any documentCache, fieldValueCahe's

 Whether this can lead any performance issue means going server down.

 And i am seeing logging in the server it is showing exception in the
 following way


 Servlet.service() for servlet [default] in context with path [/solr] threw
 exception [java.lang.IllegalStateException: Cannot call sendError() after
 the response has been committed] with root cause



 Can anybody help me how can i solve this problem.

 Kumar.









 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-Performance-Issue-tp4104907.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: json update moves doc to end

2013-12-04 Thread Erick Erickson
Well, both have a score of -Infinity. So they're equal and
the tiebreaker is the internal Lucene doc ID.

Now this is not helpful since the question now is where
-Infinity comes from, this looks suspicious:
 -Infinity = (MATCH) FunctionQuery(log(int(clicks))), product of:
-Infinity = log(int(clicks)=0)

not much help I know, but

Erick


On Wed, Dec 4, 2013 at 7:24 AM, Andreas Owen a...@conx.ch wrote:

 Hi Erick

 Here are the last 2 results from a search and i am not understanding why
 the
 last one with the boost editorschoice^200 isn't at the top. By the way can
 i
 also give a substantial boost to results that contain the hole
 search-request and not just 3 or 4 letters (tokens)?

 str name=dms:1003
 -Infinity = (MATCH) sum of:
   0.013719446 = (MATCH) max of:
 0.013719446 = (MATCH) sum of:
   2.090396E-4 = (MATCH) weight(plain_text:ber in 841)
 [DefaultSimilarity], result of:
 2.090396E-4 = score(doc=841,freq=8.0 = termFreq=8.0
 ), product of:
   0.009452709 = queryWeight, product of:
 1.3343692 = idf(docFreq=611, maxDocs=855)
 0.0070840283 = queryNorm
   0.022114253 = fieldWeight in 841, product of:
 2.828427 = tf(freq=8.0), with freq of:
   8.0 = termFreq=8.0
 1.3343692 = idf(docFreq=611, maxDocs=855)
 0.005859375 = fieldNorm(doc=841)
   0.0012402858 = (MATCH) weight(plain_text:eri in 841)
 [DefaultSimilarity], result of:
 0.0012402858 = score(doc=841,freq=9.0 = termFreq=9.0
 ), product of:
   0.022357063 = queryWeight, product of:
 3.1559815 = idf(docFreq=98, maxDocs=855)
 0.0070840283 = queryNorm
   0.05547624 = fieldWeight in 841, product of:
 3.0 = tf(freq=9.0), with freq of:
   9.0 = termFreq=9.0
 3.1559815 = idf(docFreq=98, maxDocs=855)
 0.005859375 = fieldNorm(doc=841)
   5.0511415E-4 = (MATCH) weight(plain_text:ric in 841)
 [DefaultSimilarity], result of:
 5.0511415E-4 = score(doc=841,freq=1.0 = termFreq=1.0
 ), product of:
   0.024712078 = queryWeight, product of:
 3.4884217 = idf(docFreq=70, maxDocs=855)
 0.0070840283 = queryNorm
   0.020439971 = fieldWeight in 841, product of:
 1.0 = tf(freq=1.0), with freq of:
   1.0 = termFreq=1.0
 3.4884217 = idf(docFreq=70, maxDocs=855)
 0.005859375 = fieldNorm(doc=841)
   8.721528E-4 = (MATCH) weight(plain_text:ich in 841)
 [DefaultSimilarity], result of:
 8.721528E-4 = score(doc=841,freq=12.0 = termFreq=12.0
 ), product of:
   0.017446788 = queryWeight, product of:
 2.4628344 = idf(docFreq=197, maxDocs=855)
 0.0070840283 = queryNorm
   0.049989305 = fieldWeight in 841, product of:
 3.4641016 = tf(freq=12.0), with freq of:
   12.0 = termFreq=12.0
 2.4628344 = idf(docFreq=197, maxDocs=855)
 0.005859375 = fieldNorm(doc=841)
   7.725705E-4 = (MATCH) weight(plain_text:cht in 841)
 [DefaultSimilarity], result of:
 7.725705E-4 = score(doc=841,freq=4.0 = termFreq=4.0
 ), product of:
   0.021610687 = queryWeight, product of:
 3.050621 = idf(docFreq=109, maxDocs=855)
 0.0070840283 = queryNorm
   0.035749465 = fieldWeight in 841, product of:
 2.0 = tf(freq=4.0), with freq of:
   4.0 = termFreq=4.0
 3.050621 = idf(docFreq=109, maxDocs=855)
 0.005859375 = fieldNorm(doc=841)
   0.0010287998 = (MATCH) weight(plain_text:beri in 841)
 [DefaultSimilarity], result of:
 0.0010287998 = score(doc=841,freq=1.0 = termFreq=1.0
 ), product of:
   0.035267927 = queryWeight, product of:
 4.978513 = idf(docFreq=15, maxDocs=855)
 0.0070840283 = queryNorm
   0.029170973 = fieldWeight in 841, product of:
 1.0 = tf(freq=1.0), with freq of:
   1.0 = termFreq=1.0
 4.978513 = idf(docFreq=15, maxDocs=855)
 0.005859375 = fieldNorm(doc=841)
   0.0010556461 = (MATCH) weight(plain_text:eric in 841)
 [DefaultSimilarity], result of:
 0.0010556461 = score(doc=841,freq=1.0 = termFreq=1.0
 ), product of:
   0.035725117 = queryWeight, product of:
 5.0430512 = idf(docFreq=14, maxDocs=855)
 0.0070840283 = queryNorm
   0.02954913 = fieldWeight in 841, product of:
 1.0 = tf(freq=1.0), with freq of:
   1.0 = termFreq=1.0
 5.0430512 = idf(docFreq=14, maxDocs=855)
 0.005859375 = fieldNorm(doc=841)
   5.653785E-4 = (MATCH) weight(plain_text:rich in 841)
 [DefaultSimilarity], result of:
 5.653785E-4 = score(doc=841,freq=1.0 = termFreq=1.0
 ), product of:
   0.02614473 = queryWeight, product of:
 3.6906586 = idf(docFreq=57, maxDocs=855)
 0.0070840283 = queryNorm
 

Re: Solr Doubts

2013-12-04 Thread Erick Erickson
bq: uniquekey required = false id/uniquekey

This isn't correct, there's no required param for
uniqueKey. Just remove the entire uniqueKey node
AND make the field definition required=false. I.e. you
should have something like:
field name=id type=string indexed=true stored=true required=true
multiValued=false /
set required=false there.

To increase memory, you just specify -Xmx when you start,
something like:
java -Xmx2G -Xms2G -jar start.jar

But interested or not in splitting the csv file, working with 7G
input files is going to be painful no matter what. You may
find yourself having to split it up for expediency's sake.

Best,
Erick


On Wed, Dec 4, 2013 at 7:46 AM, Jiyas Basha H jiyasbas...@mobiusservices.in
 wrote:

 Hai Team,

 I am new to Solr.
 I am trying to index 7GB CSV file.

 My questions:
 1.How to index without using uniquekey ?

 I tried with uniquekey required = false id/uniquekey

 I got -- Document is missing mandatory uniqueKey field: id

 I am using query to update csv :
 localhost:9050/solr-4.5.1/collection1/update/csv?stream.file=D:\Solr\comma15_Id.csvcommit=trueheader=falsefieldnames=ORD,ORC,SBN,BNA,POB,NUM,DST,STM,DDL,DLO,PTN,PCD,CTA,CTP,CTT

 2. how to increase jvm heap space in solr ?
 since my file is too large i am getting java heap space error

 I am not interested to split my large file into batches.however i need to
 complete indexing with 7GB CSV file.

 please assist me to index my csv file





 with regards
 Jiyas

 Problems are only opportunities with thorns on them.



Programmatically upload configuration into ZooKeeper

2013-12-04 Thread Artem Karpenko
What is the best way to upload Solr configuration files into ZooKeeper 
programmatically, i.e. - from within Java code?
I know that there are cloud-scripts for this, but in the end they should 
use some Java client library, don't they?


This question raised because we use special configuration system 
(Java-based) to store all configuration files (not only Solr) and it'd 
be cool if we could
export modified files into ZooKeeper when applying changes. We would 
then reload collections remotely via REST API.


I've digged a little into ZkCli class and it seems that SolrZkClient can 
do something along the lines above. Is it the right tool for the job?


Any hints would be appreciated.

Regards,
Artem.


Questions about commits and OOE

2013-12-04 Thread OSMAN Metin
Hi all,

let me first explain our situation :

We have


-   two virtual servers with each :

4x SolR 4.4.0 on Tomcat 6 (+ with mod_cluster 1.2.0), each JVM has -Xms2048m 
-Xmx2048m -XX:MaxPermSize=384m
1x Zookeeper 3.4.5 (Only one of the two Zookeeper is active.)
CentOS 6.4
Sun JDK 1.6.0-31
16 GB of RAM
4 vCPU


-   only one core and one shard

-   ~25 docs and 50-100 MB of index size

-   two load balancers (apache + mod_cluster) who are both connected to the 
8 SolR nodes

-   1 VIP pointing to these two LB

The commit configuration is

-   every update request do a soft commit (i.e. param softCommit=true in 
the http request)

-   autosoftcommit disabled

-   autocommit enabled every 15 seconds

The client application is a java app with SolRj client using the previous VIP 
as an endpoint.
We need NearRealTime modifications visible by the end users.
During the day, the client uses SolR with about 80% of select requests and 20% 
of update requests.
Every morning, the client is sending a massive bunch of updates (about 1 in 
a few minutes).

During this massive update, we have sometimes a peak of active threads 
exceeding the limit of 8192 process authorized for the user running the tomcat 
and zookeeper process.
When this happens, every hardCommit is failing with an OutOfMemory : unable to 
create native thread message.


Now, I have some questions :

-   Why are there some many threads created ? Is the softCommit on every 
update that opens a new thread ?

-   Once an OOE occurs, every hardcommit will be broken, even if the number 
of threads opened on the system is low. Is there any way to free the JVM ? 
The only solution we have found is to restart all the JVM.

-   When the OOE occurs, the SolR cloud console shows the leader node as 
active and the others as recovering

o   is the replication working at that moment ?

o   as all the hardcommits are failing but the softcommits not, am I very sure 
that I will not lose some updates when restarting all the nodes ?

By the way, we are planning to

-   disable the softCommit parameter on the client side and to enable the 
autosoftcommit instead.

-   create another server and make 3 zookeeper chorum instead of a unique 
zookeeper master.

-   skip the use of load balancers and let zookeeper decide which node will 
respond to the requests

Any help would be appreciated !

Metin OSMAN


Re: Using Payloads as a Coefficient For Score At a Custom QParser That extends ExtendedDismaxQParser

2013-12-04 Thread Joel Bernstein
Sounds great Furkan,

Do you have the permission to donate this code? I would be great if you
could create a JIra ticket.

Thanks,
Joel


On Tue, Dec 3, 2013 at 3:26 PM, Furkan KAMACI furkankam...@gmail.comwrote:

 I've implemented what I want. I can add payload score into the document
 score. I've modified ExtendedDismaxQParser and I can use all the abilities
 of edismax at my case. I will explain what I did at my blog.

 Thanks;
 Furkan KAMACI


 2013/12/1 Furkan KAMACI furkankam...@gmail.com

  Hi;
 
  I use Solr 4.5.1 I have a case: When a user searches for some specific
  keywords some documents should be listed at much more higher than its
 usual
  score. I mean I have probabilities of which documents user may want to
 see
  for given keywords.
 
  I have come up with that idea. I can put a new field to my schema. This
  field holds keyword and probability as payload. When a user searches for
 a
  keyword I will calculate usual document score for given fields and also I
  will make a search on payloaded field and I will multiply the total score
  with that payload.
 
  I followed that example:
  http://sujitpal.blogspot.com/2013/07/porting-payloads-to-solr4.html#!
 owever
  that example extends Qparser directly but I want to use capabilities of
  edismax.
 
  So I found that example:
 
 http://digitalpebble.blogspot.com/2010/08/using-payloads-with-dismaxqparser-in.htmlhis
  one exteds dismax and but I could not used payloads at that example.
 
  I want to combine above to solutions. First solution has that case:
 
  @Override
  public Similarity get(String name) {
  if (payloads.equals(name) || cscores.equals(name)) {
  return new PayloadSimilarity();
  } else {
  return new DefaultSimilarity();
  }
  }
 
  However dismax behaves different. i.e. when you search for cscores:A it
  changes that into that:
 
  *+((text:cscores:y text:cscores text:y text:cscoresy)) ()*
 
  When I debug it name is text instead of cscores and does not work. My
 idea
  is combining two examples and extending edismax. Do you have any idea how
  to extend it for edismax or do you have any idea what to do for my case.
 
  *PS:* I've sent same question at Lucene user list too. I ask it here to
  get an idea from Solr perspective too.
 
  Thanks;
  Furkan KAMACI
 




-- 
Joel Bernstein
Search Engineer at Heliosearch


Re: Programmatically upload configuration into ZooKeeper

2013-12-04 Thread Greg Walters
Hi Artem,

This question (or one very like it) has been asked on this list before so 
there's some prior art you could modify to suit your needs.

Taken from Timothy Potter thelabd...@gmail.com:

**
   public static void updateClusterstateJsonInZk(CloudSolrServer
cloudSolrServer, CommandLine cli) throws Exception {
   String updateClusterstateJson =
cli.getOptionValue(updateClusterstateJson);

   ZkStateReader zkStateReader = cloudSolrServer.getZkStateReader();
   SolrZkClient zkClient = zkStateReader.getZkClient();

   File jsonFile = new File(updateClusterstateJson);
   if (!jsonFile.isFile()) {
   System.err.println(jsonFile.getAbsolutePath()+ not found.);
   return;
   }

   byte[] clusterstateJson = readFile(jsonFile);

   // validate the user is passing is valid JSON
   InputStreamReader bytesReader = new InputStreamReader(new
ByteArrayInputStream(clusterstateJson), UTF-8);
   JSONParser parser = new JSONParser(bytesReader);
   parser.toString();

   zkClient.setData(/clusterstate.json, clusterstateJson, true);
   System.out.println(Updated /clusterstate.json with data from
+jsonFile.getAbsolutePath());
   }
**

You should be able to modify that or use it as a basis for uploading the 
changed files in your config.

Thanks,
Greg

On Dec 4, 2013, at 8:36 AM, Artem Karpenko gooy...@gmail.com wrote:

 What is the best way to upload Solr configuration files into ZooKeeper 
 programmatically, i.e. - from within Java code?
 I know that there are cloud-scripts for this, but in the end they should use 
 some Java client library, don't they?
 
 This question raised because we use special configuration system (Java-based) 
 to store all configuration files (not only Solr) and it'd be cool if we could
 export modified files into ZooKeeper when applying changes. We would then 
 reload collections remotely via REST API.
 
 I've digged a little into ZkCli class and it seems that SolrZkClient can do 
 something along the lines above. Is it the right tool for the job?
 
 Any hints would be appreciated.
 
 Regards,
 Artem.



Re: Solr Performance Issue

2013-12-04 Thread Shawn Heisey
On 12/4/2013 6:31 AM, kumar wrote:
 I am having almost 5 to 6 crores of indexed documents in solr. And when i am
 going to change anything in the configuration file solr server is going
 down.

If you mean crore and not core, then you are talking about 50 to 60
million documents.  That's a lot.  Solr is perfectly capable of handling
that many documents, but you do need to have very good hardware.

Even if they are small, your index is likely to be many gigabytes in
size.  If the documents are large, that might be measured in terabytes.
 Large indexes require a lot of memory for good performance.  This will
be discussed in more detail below.

 As a new user to solr i can't able to find the exact reason for going server
 down.
 
 I am using cache's in the following way :
 
 filterCache class=solr.FastLRUCache
  size=16384
  initialSize=4096
  autowarmCount=4096/
  queryResultCache class=solr.FastLRUCache
  size=16384
  initialSize=4096
  autowarmCount=1024/
 
 and i am not using any documentCache, fieldValueCahe's

As Erick said, these cache sizes are HUGE.  In particular, your
autowarmCount values are extremely high.

 Whether this can lead any performance issue means going server down.

Another thing that Erick pointed out is that you haven't really told us
what's happening.  When you say that the server goes down, what EXACTLY
do you mean?

 And i am seeing logging in the server it is showing exception in the
 following way
 
 
 Servlet.service() for servlet [default] in context with path [/solr] threw
 exception [java.lang.IllegalStateException: Cannot call sendError() after
 the response has been committed] with root cause

This message comes from your servlet container, not Solr.  You're
probably using Tomcat, not the included Jetty.  There is some indirect
evidence that this can be fixed by increasing the servlet container's
setting for the maximum number of request parameters.

http://forums.adobe.com/message/4590864

Here's what I can say without further information:

You're likely having performance issues.  One potential problem is your
insanely high autowarmCount values.  Your cache configuration tells Solr
that every time you have a soft commit or a hard commit with
openSearcher=true, you're going to execute up to 1024 queries and up to
4096 filters from the old caches, in order to warm the new caches.  Even
if you have an optimal setup, this takes a lot of time.  I suspect that
you don't have an optimal setup.

Another potential problem is that you don't have enough memory for the
size of your index.  A number of potential performance problems are
discussed on this wiki page:

http://wiki.apache.org/solr/SolrPerformanceProblems

A lot more details are required.  Here's some things that will be
helpful, and more is always better:

* Exact symptoms.
* Excerpts from the Solr logfile that include entire stacktraces.
* Operating system and version.
* Total server index size on disk.
* Total machine memory.
* Java heap size for your servlet container.
* Which servlet container you are using to run Solr.
* Solr version.
* Server hardware details.

Thanks,
Shawn



RE: Questions about commits and OOE

2013-12-04 Thread Tim Potter
Hi Metin,

I think removing the softCommit=true parameter on the client side will 
definitely help as NRT wasn't designed to re-open searchers after every 
document. Try every 1 second (or even every few seconds), I doubt your users 
will notice. To get an idea of what threads are running in your JVM process, 
you can use jstack.

Cheers,

Timothy Potter
Sr. Software Engineer, LucidWorks
www.lucidworks.com


From: OSMAN Metin metin.os...@canal-plus.com
Sent: Wednesday, December 04, 2013 7:36 AM
To: solr-user@lucene.apache.org
Subject: Questions about commits and OOE

Hi all,

let me first explain our situation :

We have


-   two virtual servers with each :

4x SolR 4.4.0 on Tomcat 6 (+ with mod_cluster 1.2.0), each JVM has -Xms2048m 
-Xmx2048m -XX:MaxPermSize=384m
1x Zookeeper 3.4.5 (Only one of the two Zookeeper is active.)
CentOS 6.4
Sun JDK 1.6.0-31
16 GB of RAM
4 vCPU


-   only one core and one shard

-   ~25 docs and 50-100 MB of index size

-   two load balancers (apache + mod_cluster) who are both connected to the 
8 SolR nodes

-   1 VIP pointing to these two LB

The commit configuration is

-   every update request do a soft commit (i.e. param softCommit=true in 
the http request)

-   autosoftcommit disabled

-   autocommit enabled every 15 seconds

The client application is a java app with SolRj client using the previous VIP 
as an endpoint.
We need NearRealTime modifications visible by the end users.
During the day, the client uses SolR with about 80% of select requests and 20% 
of update requests.
Every morning, the client is sending a massive bunch of updates (about 1 in 
a few minutes).

During this massive update, we have sometimes a peak of active threads 
exceeding the limit of 8192 process authorized for the user running the tomcat 
and zookeeper process.
When this happens, every hardCommit is failing with an OutOfMemory : unable to 
create native thread message.


Now, I have some questions :

-   Why are there some many threads created ? Is the softCommit on every 
update that opens a new thread ?

-   Once an OOE occurs, every hardcommit will be broken, even if the number 
of threads opened on the system is low. Is there any way to free the JVM ? 
The only solution we have found is to restart all the JVM.

-   When the OOE occurs, the SolR cloud console shows the leader node as 
active and the others as recovering

o   is the replication working at that moment ?

o   as all the hardcommits are failing but the softcommits not, am I very sure 
that I will not lose some updates when restarting all the nodes ?

By the way, we are planning to

-   disable the softCommit parameter on the client side and to enable the 
autosoftcommit instead.

-   create another server and make 3 zookeeper chorum instead of a unique 
zookeeper master.

-   skip the use of load balancers and let zookeeper decide which node will 
respond to the requests

Any help would be appreciated !

Metin OSMAN


Re: Programmatically upload configuration into ZooKeeper

2013-12-04 Thread Artem Karpenko

Hello Greg,

so it's SolrZkClient indeed. I've tried it out and it seems to do just 
the job I need. Thank you!


On a related note - is there a similar way to create/reload 
core/collection, using maybe CloudSolrServer or smth. inside it? Didn't 
found any methods that could do the thing.


Regards,
Artem.

04.12.2013 17:15, Greg Walters пишет:

Hi Artem,

This question (or one very like it) has been asked on this list before so 
there's some prior art you could modify to suit your needs.

Taken from Timothy Potter thelabd...@gmail.com:

**
public static void updateClusterstateJsonInZk(CloudSolrServer
cloudSolrServer, CommandLine cli) throws Exception {
String updateClusterstateJson =
cli.getOptionValue(updateClusterstateJson);

ZkStateReader zkStateReader = cloudSolrServer.getZkStateReader();
SolrZkClient zkClient = zkStateReader.getZkClient();

File jsonFile = new File(updateClusterstateJson);
if (!jsonFile.isFile()) {
System.err.println(jsonFile.getAbsolutePath()+ not found.);
return;
}

byte[] clusterstateJson = readFile(jsonFile);

// validate the user is passing is valid JSON
InputStreamReader bytesReader = new InputStreamReader(new
ByteArrayInputStream(clusterstateJson), UTF-8);
JSONParser parser = new JSONParser(bytesReader);
parser.toString();

zkClient.setData(/clusterstate.json, clusterstateJson, true);
System.out.println(Updated /clusterstate.json with data from
+jsonFile.getAbsolutePath());
}
**

You should be able to modify that or use it as a basis for uploading the 
changed files in your config.

Thanks,
Greg

On Dec 4, 2013, at 8:36 AM, Artem Karpenko gooy...@gmail.com wrote:


What is the best way to upload Solr configuration files into ZooKeeper 
programmatically, i.e. - from within Java code?
I know that there are cloud-scripts for this, but in the end they should use 
some Java client library, don't they?

This question raised because we use special configuration system (Java-based) 
to store all configuration files (not only Solr) and it'd be cool if we could
export modified files into ZooKeeper when applying changes. We would then 
reload collections remotely via REST API.

I've digged a little into ZkCli class and it seems that SolrZkClient can do 
something along the lines above. Is it the right tool for the job?

Any hints would be appreciated.

Regards,
Artem.






Setting routerField/shardKey on specific collection?

2013-12-04 Thread Daniel Bryant

Hi,

I'm using Solr 4.6 and trying to specify a router.field (shard key) on a 
specific collection so that all documents with the same value in the 
specified field end up in the same collection.


However, I can't find an example of how to do this via the solr.xml? I 
see in this ticket https://issues.apache.org/jira/browse/SOLR-5017 there 
is a mention of a routeField property.


Should the solr.xml contain the following?

cores adminPath=/admin/cores defaultCoreName=collection1
core name=collection1 instanceDir=collection1 
routerField=consolidationGroupId /

/cores

Any help would be greatly appreciate? I've been yak shaving all 
afternoon reading various Jira tickets and wikis trying to get this to 
work :-)


Best wishes,

Daniel


--
*Daniel Bryant  |  Software Development Consultant  | www.tai-dev.co.uk 
http://www.tai-dev.co.uk/*
daniel.bry...@tai-dev.co.uk mailto:daniel.bry...@tai-dev.co.uk  |  +44 
(0) 7799406399  |  Twitter: @taidevcouk https://twitter.com/taidevcouk


RE: json update moves doc to end

2013-12-04 Thread Andreas Owen
I changed my boost-function log(clickrate)^8 to div(clciks,displays)^8 and
it works now. I get the following output from debug

0.0022668892 = (MATCH) FunctionQuery(div(const(2),const(5))), product of:
0.4 = div(const(2),const(5))
8.0 = boost
7.0840283E-4 = queryNorm

Am i undestanding this right, that 0.4 and 8.0 result in 7.084? I'm
having trouble undestanding how much i boosted it.

As i use NgramFilterFactory i get a lot of hits because of the tokens. Can i
make the boost higher if the hole search-term is found and not just part of
it?


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Mittwoch, 4. Dezember 2013 15:07
To: solr-user@lucene.apache.org
Subject: Re: json update moves doc to end

Well, both have a score of -Infinity. So they're equal and the tiebreaker
is the internal Lucene doc ID.

Now this is not helpful since the question now is where -Infinity comes
from, this looks suspicious:
 -Infinity = (MATCH) FunctionQuery(log(int(clicks))), product of:
-Infinity = log(int(clicks)=0)

not much help I know, but

Erick


On Wed, Dec 4, 2013 at 7:24 AM, Andreas Owen a...@conx.ch wrote:

 Hi Erick

 Here are the last 2 results from a search and i am not understanding 
 why the last one with the boost editorschoice^200 isn't at the top. By 
 the way can i also give a substantial boost to results that contain 
 the hole search-request and not just 3 or 4 letters (tokens)?

 str name=dms:1003
 -Infinity = (MATCH) sum of:
   0.013719446 = (MATCH) max of:
 0.013719446 = (MATCH) sum of:
   2.090396E-4 = (MATCH) weight(plain_text:ber in 841) 
 [DefaultSimilarity], result of:
 2.090396E-4 = score(doc=841,freq=8.0 = termFreq=8.0 ), product 
 of:
   0.009452709 = queryWeight, product of:
 1.3343692 = idf(docFreq=611, maxDocs=855)
 0.0070840283 = queryNorm
   0.022114253 = fieldWeight in 841, product of:
 2.828427 = tf(freq=8.0), with freq of:
   8.0 = termFreq=8.0
 1.3343692 = idf(docFreq=611, maxDocs=855)
 0.005859375 = fieldNorm(doc=841)
   0.0012402858 = (MATCH) weight(plain_text:eri in 841) 
 [DefaultSimilarity], result of:
 0.0012402858 = score(doc=841,freq=9.0 = termFreq=9.0 ), 
 product of:
   0.022357063 = queryWeight, product of:
 3.1559815 = idf(docFreq=98, maxDocs=855)
 0.0070840283 = queryNorm
   0.05547624 = fieldWeight in 841, product of:
 3.0 = tf(freq=9.0), with freq of:
   9.0 = termFreq=9.0
 3.1559815 = idf(docFreq=98, maxDocs=855)
 0.005859375 = fieldNorm(doc=841)
   5.0511415E-4 = (MATCH) weight(plain_text:ric in 841) 
 [DefaultSimilarity], result of:
 5.0511415E-4 = score(doc=841,freq=1.0 = termFreq=1.0 ), 
 product of:
   0.024712078 = queryWeight, product of:
 3.4884217 = idf(docFreq=70, maxDocs=855)
 0.0070840283 = queryNorm
   0.020439971 = fieldWeight in 841, product of:
 1.0 = tf(freq=1.0), with freq of:
   1.0 = termFreq=1.0
 3.4884217 = idf(docFreq=70, maxDocs=855)
 0.005859375 = fieldNorm(doc=841)
   8.721528E-4 = (MATCH) weight(plain_text:ich in 841) 
 [DefaultSimilarity], result of:
 8.721528E-4 = score(doc=841,freq=12.0 = termFreq=12.0 ), 
 product of:
   0.017446788 = queryWeight, product of:
 2.4628344 = idf(docFreq=197, maxDocs=855)
 0.0070840283 = queryNorm
   0.049989305 = fieldWeight in 841, product of:
 3.4641016 = tf(freq=12.0), with freq of:
   12.0 = termFreq=12.0
 2.4628344 = idf(docFreq=197, maxDocs=855)
 0.005859375 = fieldNorm(doc=841)
   7.725705E-4 = (MATCH) weight(plain_text:cht in 841) 
 [DefaultSimilarity], result of:
 7.725705E-4 = score(doc=841,freq=4.0 = termFreq=4.0 ), product 
 of:
   0.021610687 = queryWeight, product of:
 3.050621 = idf(docFreq=109, maxDocs=855)
 0.0070840283 = queryNorm
   0.035749465 = fieldWeight in 841, product of:
 2.0 = tf(freq=4.0), with freq of:
   4.0 = termFreq=4.0
 3.050621 = idf(docFreq=109, maxDocs=855)
 0.005859375 = fieldNorm(doc=841)
   0.0010287998 = (MATCH) weight(plain_text:beri in 841) 
 [DefaultSimilarity], result of:
 0.0010287998 = score(doc=841,freq=1.0 = termFreq=1.0 ), 
 product of:
   0.035267927 = queryWeight, product of:
 4.978513 = idf(docFreq=15, maxDocs=855)
 0.0070840283 = queryNorm
   0.029170973 = fieldWeight in 841, product of:
 1.0 = tf(freq=1.0), with freq of:
   1.0 = termFreq=1.0
 4.978513 = idf(docFreq=15, maxDocs=855)
 0.005859375 = fieldNorm(doc=841)
   0.0010556461 = (MATCH) weight(plain_text:eric in 841) 
 [DefaultSimilarity], 

Re: Programmatically upload configuration into ZooKeeper

2013-12-04 Thread Shawn Heisey
On 12/4/2013 9:23 AM, Artem Karpenko wrote:
 so it's SolrZkClient indeed. I've tried it out and it seems to do just
 the job I need. Thank you!
 
 On a related note - is there a similar way to create/reload
 core/collection, using maybe CloudSolrServer or smth. inside it? Didn't
 found any methods that could do the thing.

This should probably work for reloading collection1.  I can't test it
right now, as I'm about to start my morning commute.

CloudSolrServer srv =
new CloudSolrServer(zoo1:2181,zoo2:2181,zoo3:2181/mysolr);
srv.setDefaultCollection(collection2);
SolrQuery q = new SolrQuery();
q.setRequestHandler(/admin/collections);
q.set(action, RELOAD);
q.set(name, collection1);
QueryResponse x = srv.query(q);

If you want to reload an individual core, you'd need to use
HttpSolrServer, not CloudSolrServer.  SOLR-4140 made it possible to use
the collections API with CloudSolrServer, but as far as I can tell, it
doesn't enable the CoreAdmin API.

Note that reloads don't work right with SolrCloud unless the server
version is at least 4.4, due to a bug.

Thanks,
Shawn



Re: SolrCloud FunctionQuery inconsistency

2013-12-04 Thread Chris Hostetter

: There is no default value for ptime. It is generated by users.

thank you, that rules out my previous wild guess.

: I was trying query with a function query({!boost b=dateDeboost(ptime)}
: channelid:0082  title:abc), which leads differents results from the same
: shard(using the param: shards=shard3).
: 
: The diffenence is maxScore, which is not consistent. And the maxScore is

Ok ... but you still haven't provided enough information for us to make a 
guess as to why you are seeing inconsistent scores coming back form your 
queries -- at a minimum we need to see the debugQuery=true output for each 
of the different replicas that are generating differnet scores.

It's possible that the descrepency you are seeing is a minor one resulting 
from slightly different term stats (ie: segments being merged slightly 
differnetly on differnet replicas), or it could be a symptom of a larger 
problem.



-Hoss
http://www.lucidworks.com/


Re: json update moves doc to end

2013-12-04 Thread Chris Hostetter

: Well, both have a score of -Infinity. So they're equal and
: the tiebreaker is the internal Lucene doc ID.
: 
: Now this is not helpful since the question now is where
: -Infinity comes from, this looks suspicious:
:  -Infinity = (MATCH) FunctionQuery(log(int(clicks))), product of:
: -Infinity = log(int(clicks)=0)

If the score of this doc was not -Infinity before your doc update, and 
it became -Infinity after your update, and your update did not 
intentionally change the value of the clicks field to 0 then i suspect 
what you are seeing is the result of not having all of your fields as 
stored=true...

https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents

   /!\ All original source fields must be stored for field modifiers to 
   work correctly, which is the Solr default

-Hoss
http://www.lucidworks.com/


Re: Questions about commits and OOE

2013-12-04 Thread Daniel Collins
I'd second the use of jstack to check your threads.  Each request (be it a
search or update) will generate a request handler thread on the Solr side
(unless you've set the limits in the HttpShardHandlerFactory (solr.xml for
solr-wide faults and/or under the requestHandler in SolrConfig.xml), we
set maxConnectionsPerHost, corePoolSize and maximumPoolSize, since we ran
into a similar issue.

Our system ironically didn't crash, we just had a JVM with.about 256000
threads, which was rather SSLLOOWW :)

On the softCommit front, we have had some success with small softCommit
times, but then we use SSDs (and have lots of memory and lots of shards).
Once we get concrete figures, we'll publish them, but we are a fair way
below 1s now with no major impact on indexing throughput (yet).  But I
would agree that unless you are really really sure you need it (and most
people don't), keep to the known limits.


On 4 December 2013 16:09, Tim Potter tim.pot...@lucidworks.com wrote:

 Hi Metin,

 I think removing the softCommit=true parameter on the client side will
 definitely help as NRT wasn't designed to re-open searchers after every
 document. Try every 1 second (or even every few seconds), I doubt your
 users will notice. To get an idea of what threads are running in your JVM
 process, you can use jstack.

 Cheers,

 Timothy Potter
 Sr. Software Engineer, LucidWorks
 www.lucidworks.com

 
 From: OSMAN Metin metin.os...@canal-plus.com
 Sent: Wednesday, December 04, 2013 7:36 AM
 To: solr-user@lucene.apache.org
 Subject: Questions about commits and OOE

 Hi all,

 let me first explain our situation :

 We have


 -   two virtual servers with each :

 4x SolR 4.4.0 on Tomcat 6 (+ with mod_cluster 1.2.0), each JVM has
 -Xms2048m -Xmx2048m -XX:MaxPermSize=384m
 1x Zookeeper 3.4.5 (Only one of the two Zookeeper is active.)
 CentOS 6.4
 Sun JDK 1.6.0-31
 16 GB of RAM
 4 vCPU


 -   only one core and one shard

 -   ~25 docs and 50-100 MB of index size

 -   two load balancers (apache + mod_cluster) who are both connected
 to the 8 SolR nodes

 -   1 VIP pointing to these two LB

 The commit configuration is

 -   every update request do a soft commit (i.e. param softCommit=true
 in the http request)

 -   autosoftcommit disabled

 -   autocommit enabled every 15 seconds

 The client application is a java app with SolRj client using the previous
 VIP as an endpoint.
 We need NearRealTime modifications visible by the end users.
 During the day, the client uses SolR with about 80% of select requests and
 20% of update requests.
 Every morning, the client is sending a massive bunch of updates (about
 1 in a few minutes).

 During this massive update, we have sometimes a peak of active threads
 exceeding the limit of 8192 process authorized for the user running the
 tomcat and zookeeper process.
 When this happens, every hardCommit is failing with an OutOfMemory :
 unable to create native thread message.


 Now, I have some questions :

 -   Why are there some many threads created ? Is the softCommit on
 every update that opens a new thread ?

 -   Once an OOE occurs, every hardcommit will be broken, even if the
 number of threads opened on the system is low. Is there any way to free
 the JVM ? The only solution we have found is to restart all the JVM.

 -   When the OOE occurs, the SolR cloud console shows the leader node
 as active and the others as recovering

 o   is the replication working at that moment ?

 o   as all the hardcommits are failing but the softcommits not, am I very
 sure that I will not lose some updates when restarting all the nodes ?

 By the way, we are planning to

 -   disable the softCommit parameter on the client side and to enable
 the autosoftcommit instead.

 -   create another server and make 3 zookeeper chorum instead of a
 unique zookeeper master.

 -   skip the use of load balancers and let zookeeper decide which node
 will respond to the requests

 Any help would be appreciated !

 Metin OSMAN



RE: Setting routerField/shardKey on specific collection?

2013-12-04 Thread Tim Potter
Hi Daniel,

I'm not sure how this would apply to an existing collection (in your case 
collection1). Try using the collections API to create a new collection and pass 
the router.field parameter. Grep'ing over the code, the parameter is named: 
router.field (not routerField or routeField).

Cheers,

Timothy Potter
Sr. Software Engineer, LucidWorks
www.lucidworks.com


From: Daniel Bryant daniel.bry...@tai-dev.co.uk
Sent: Wednesday, December 04, 2013 9:40 AM
To: solr-user@lucene.apache.org
Subject: Setting routerField/shardKey on specific collection?

Hi,

I'm using Solr 4.6 and trying to specify a router.field (shard key) on a
specific collection so that all documents with the same value in the
specified field end up in the same collection.

However, I can't find an example of how to do this via the solr.xml? I
see in this ticket https://issues.apache.org/jira/browse/SOLR-5017 there
is a mention of a routeField property.

Should the solr.xml contain the following?

cores adminPath=/admin/cores defaultCoreName=collection1
 core name=collection1 instanceDir=collection1
routerField=consolidationGroupId /
/cores

Any help would be greatly appreciate? I've been yak shaving all
afternoon reading various Jira tickets and wikis trying to get this to
work :-)

Best wishes,

Daniel


--
*Daniel Bryant  |  Software Development Consultant  | www.tai-dev.co.uk
http://www.tai-dev.co.uk/*
daniel.bry...@tai-dev.co.uk mailto:daniel.bry...@tai-dev.co.uk  |  +44
(0) 7799406399  |  Twitter: @taidevcouk https://twitter.com/taidevcouk


Re: Setting routerField/shardKey on specific collection?

2013-12-04 Thread Daniel Bryant
Many thanks Timothy, I tried this today but ran into issues getting the 
new collection to persist (so that I could search for the parameter). 
It's good to have this confirmed as a viable approach though, and I'll 
persevere with this tomorrow.


If I figure it out I'll reply with the details.

Thanks again,

Daniel


On 04/12/2013 17:41, Tim Potter wrote:

Hi Daniel,

I'm not sure how this would apply to an existing collection (in your case 
collection1). Try using the collections API to create a new collection and pass 
the router.field parameter. Grep'ing over the code, the parameter is named: 
router.field (not routerField or routeField).

Cheers,

Timothy Potter
Sr. Software Engineer, LucidWorks
www.lucidworks.com


From: Daniel Bryant daniel.bry...@tai-dev.co.uk
Sent: Wednesday, December 04, 2013 9:40 AM
To: solr-user@lucene.apache.org
Subject: Setting routerField/shardKey on specific collection?

Hi,

I'm using Solr 4.6 and trying to specify a router.field (shard key) on a
specific collection so that all documents with the same value in the
specified field end up in the same collection.

However, I can't find an example of how to do this via the solr.xml? I
see in this ticket https://issues.apache.org/jira/browse/SOLR-5017 there
is a mention of a routeField property.

Should the solr.xml contain the following?

cores adminPath=/admin/cores defaultCoreName=collection1
  core name=collection1 instanceDir=collection1
routerField=consolidationGroupId /
/cores

Any help would be greatly appreciate? I've been yak shaving all
afternoon reading various Jira tickets and wikis trying to get this to
work :-)

Best wishes,

Daniel


--
*Daniel Bryant  |  Software Development Consultant  | www.tai-dev.co.uk
http://www.tai-dev.co.uk/*
daniel.bry...@tai-dev.co.uk mailto:daniel.bry...@tai-dev.co.uk  |  +44
(0) 7799406399  |  Twitter: @taidevcouk https://twitter.com/taidevcouk


--
*Daniel Bryant  |  Software Development Consultant  | www.tai-dev.co.uk 
http://www.tai-dev.co.uk/*
daniel.bry...@tai-dev.co.uk mailto:daniel.bry...@tai-dev.co.uk  |  +44 
(0) 7799406399  |  Twitter: @taidevcouk https://twitter.com/taidevcouk


Re: Shouldn't fuzzy version of a solr query always return a super set of its not-fuzzy equivalent

2013-12-04 Thread Mhd Wrk
Debug shows that all terms are lowercased properly.

Thanks
On Dec 4, 2013 3:18 AM, Erik Hatcher erik.hatc...@gmail.com wrote:

 Chances are you're not getting those fuzzy terms analyzed as you'd like.
  See debug (debug=true) output to be sure.  Most likely the fuzzy terms
 are not being lowercased.  See
 http://wiki.apache.org/solr/MultitermQueryAnalysis for more details (this
 applies to fuzzy, not just wildcard) terms too.

 Erik


 On Dec 4, 2013, at 4:46 AM, Mhd Wrk mhd...@gmail.com wrote:

  I'm using the following query to do a fuzzy search on Solr 4.5.1 and am
  getting empty result.
 
  qt=standardq=+(field1|en_CA|:Swimming~2 field1|en|:Swimming~2)
  +(field1|en_CA|:Goggle~1 field1|en|:Goggle~1) +(+startDate:[* TO
  2013-12-04T00:23:00Z] -endDate:[* TO
  2013-12-04T00:23:00Z])start=0rows=10fl=id
 
  If I change it to a not fuzzy query by simply dropping tildes from the
  terms (see below) then it returns the expected result! Is this a bug?
  Shouldn't fuzzy version of a query always return a super set of its
  not-fuzzy equivalent?
 
  qt=standardq=+(field1|en_CA|:Swimming field1|en|:Swimming)
  +(field1|en_CA|:Goggle field1|en|:Goggle) +(+startDate:[* TO
  2013-12-04T00:23:00Z] -endDate:[* TO
  2013-12-04T00:23:00Z])start=0rows=10fl=id




Solr Stalls on Bulk indexing, no logs or errors

2013-12-04 Thread steven crichton
I am finding with a bulk index using SOLR 4.3 on Tomcat, that when I reach
69578 records the server stops adding anything more. 

I've tried reducing the data sent to the bare minimum of fields and using
ASC and DESC data to see if it could be a field issue.

Is there anything I could look at for this? As I'm not finding anything
similar noted before. Does tomcat have issues with closing connections that
look like DDOS attacks? Or could it be related to too many commits in too
short a time?

Any help will be very greatly appreciated.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Stalls-on-Bulk-indexing-no-logs-or-errors-tp4104981.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: a core for every user, lots of users... are there issues

2013-12-04 Thread hank williams
Ok one more simple question. We just upgraded to 4.6 from 4.2. In 4.2 we
were *trying* to use the rest API function create to create cores without
having to manually mess with files on the server. Is this what create was
supposed to do? If so it was borken or we werent using it right. In any
case in 4.6 is that the right way to programmatically add cores in
discovery mode?


On Tue, Dec 3, 2013 at 7:37 PM, Erick Erickson erickerick...@gmail.comwrote:

 bq: Do you have any sense of what a good upper limit might be, or how we
 might figure that out?

 As always, it depends (tm). And the biggest thing it depends upon is the
 number of simultaneous users you have and the size of their indexes. And
 we've arrived at the black box of estimating size again. Siiihh... I'm
 afraid that the only way is to test and establish some rules of thumb.

 The transient core constraint will limit the number of cores loaded at
 once. If you allow too many cores at once, you'll get OOM errors when all
 the users pile on at the same time.

 Let's say you've determined that 100 is the limit for transient cores. What
 I suspect you'll see is degrading response times if this is too low. Say
 110 users are signed on and say they submit queries perfectly in order, one
 after the other. Every request will require the core to be opened and it'll
 take a bit. So that'll be a flag.

 Or that's a fine limit but your users have added more and more documents
 and you're coming under memory pressure.

 As you can tell I don't have any good answers. I've seen between 10M and
 300M documents on a single machine

 BTW, on a _very_ casual test I found about 1000 cores/second were found in
 discovery mode. While they aren't loaded if they're transient, it's still a
 consideration if you have 10s of thousands.

 Best,
 Erick



 On Tue, Dec 3, 2013 at 3:33 PM, hank williams hank...@gmail.com wrote:

  On Tue, Dec 3, 2013 at 3:20 PM, Erick Erickson erickerick...@gmail.com
  wrote:
 
   You probably want to look at transient cores, see:
   http://wiki.apache.org/solr/LotsOfCores
  
   But millions will be interesting for a single node, you must have
 some
   kind of partitioning in mind?
  
  
  Wow. Thanks for that great link. Yes we are sharding so its not like
 there
  would be millions of cores on one machine or even cluster. And since the
  cores are one per user, this is a totally clean approach. But still we
 want
  to make sure that we are not overloading the machine. Do you have any
 sense
  of what a good upper limit might be, or how we might figure that out?
 
 
 
   Best,
   Erick
  
  
   On Tue, Dec 3, 2013 at 2:38 PM, hank williams hank...@gmail.com
 wrote:
  
 We are building a system where there is a core for every user. There
   will
be many tens or perhaps ultimately hundreds of thousands or millions
 of
users. We do not need each of those users to have “warm” data in
  memory.
   In
fact doing so would consume lots of memory unnecessarily, for users
  that
might not have logged in in a long time.
   
So my question is, is the default behavior of Solr to try to keep all
  of
our cores warm, and if so, can we stop it? Also given the number of
  cores
that we will likely have is there anything else we should be keeping
 in
mind to maximize performance and minimize memory usage?
   
  
 
 
 
  --
  blog: whydoeseverythingsuck.com
 




-- 
blog: whydoeseverythingsuck.com


Re: a core for every user, lots of users... are there issues

2013-12-04 Thread Shawn Heisey

On 12/4/2013 12:34 PM, hank williams wrote:

Ok one more simple question. We just upgraded to 4.6 from 4.2. In 4.2 we
were *trying* to use the rest API function create to create cores without
having to manually mess with files on the server. Is this what create was
supposed to do? If so it was borken or we werent using it right. In any
case in 4.6 is that the right way to programmatically add cores in
discovery mode?


If you are NOT in SolrCloud mode, in order to create new cores, the 
config files need to already exist on the disk.  This is the case with 
all versions of Solr.


If you're running in SolrCloud mode, the core is associated with a 
collection.  Collections have a link to aconfig in zookeeper.  The 
config is not stored with the core on the disk.


Thanks,
Shawn



Re: Solr Stalls on Bulk indexing, no logs or errors

2013-12-04 Thread Erick Erickson
There's a known issue with SolrCloud with multiple shards, but
you haven't told us whether you're using that. The test for
whether you're running in to that is whether you can continue
to _query_, just not update.

But you need to tell us more about our setup. In particular
hour commit settings (hard and soft), your solrconfig settings,
particularly around autowarming, how you're bulk indexing,
SolrJ? DIH? a huge CSV file?

Best,
Erick


On Wed, Dec 4, 2013 at 2:30 PM, steven crichton stevencrich...@mac.comwrote:

 I am finding with a bulk index using SOLR 4.3 on Tomcat, that when I reach
 69578 records the server stops adding anything more.

 I've tried reducing the data sent to the bare minimum of fields and using
 ASC and DESC data to see if it could be a field issue.

 Is there anything I could look at for this? As I'm not finding anything
 similar noted before. Does tomcat have issues with closing connections that
 look like DDOS attacks? Or could it be related to too many commits in too
 short a time?

 Any help will be very greatly appreciated.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-Stalls-on-Bulk-indexing-no-logs-or-errors-tp4104981.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Tika not extracting content from ODT / ODS (open document / libreoffice) in Solr 4.2.1

2013-12-04 Thread Augusto Camarotti
Hello everybody,
 
First of all, sorry about my bad english.
 
Giving updates on this bug, i maybe have found a solution for it.
I would like to have opinions on this solution.
I have found out that tika, when reading .odt files, would return more
than one document.
The first one for content.xml, which have the actual content of the
file, and the second one for styles.xml.
To test this, try to modify an .odt file removing styles.xml and solr
should parse its contents normally.
Solr, when receiving the second document (styles.xml), erases anything
it has read before. In general, styles.xml doesnt have any text on it,
so it receives just some spaces. 
I just modified a function inside 'SolrContentHandler.java' that erases
the content of the first document. I made this function to just add an
space, do not erase any previous content, so will always add up any
document tika is returning for solr.
I guess this behavior is going to work for previous cases, but i need
your opinion about this.
 
Here is the only modification i made on 'SolrContentHandler.java' 
 
  @Override
  public void startDocument() throws SAXException {
document.clear();
//catchAllBuilder.setLength(0);
//Augusto Camarotti - 28-11-2013
//As tika may parse more than one documents in one file, i have to
append every documento tika parses me,
//so, i will only append a whitespace and wait for new content
everytime. Otherwise, Solr would just get the last document of the file
catchAllBuilder.append(' ');
for (StringBuilder builder : fieldBuilders.values()) {
  builder.setLength(0);
}
bldrStack.clear();
bldrStack.add(catchAllBuilder);
  }
 
 
Regards, 
 
Augusto Camarotti

 Alexandre Rafalovitch arafa...@gmail.com 10/05/2013 21:13 
I would try DIH with the flags as in jira issue I linked to. If
possible.
Just in case.

Regards,
Alex
On 10 May 2013 19:53, Sebastián Ramírez
sebastian.rami...@senseta.com
wrote:

 OK Jack, I'll switch to MS Office ...hahaha

 Many thanks for your interest and help... and the bug report in
JIRA.

 Best,

 Sebastián Ramírez


 On Fri, May 10, 2013 at 5:48 PM, Jack Krupansky
j...@basetechnology.com
 wrote:

  I filed  SOLR-4809 - OpenOffice document body is not indexed by
  SolrCell, including some test files.
 
  https://issues.apache.org/**jira/browse/SOLR-4809
 https://issues.apache.org/jira/browse/SOLR-4809
 
  Yeah, at this stage, switching to Microsoft Office seems like the
best
 bet!
 
 
  -- Jack Krupansky
 
  -Original Message- From: Sebastián Ramírez
  Sent: Friday, May 10, 2013 6:33 PM
  To: solr-user@lucene.apache.org
  Subject: Re: Tika not extracting content from ODT / ODS (open
document /
  libreoffice) in Solr 4.2.1
 
 
  Many thanks Jack for your attention and effort on solving the
problem.
 
  Best,
 
  Sebastián Ramírez
 
 
  On Fri, May 10, 2013 at 5:23 PM, Jack Krupansky
j...@basetechnology.com
 *
  *wrote:
 
   I downloaded the latest Apache OpenOffice 3.4.1 and it does in
fact fail
  to index the proper content, both for .ODP and .ODT files.
 
  If I do extractOnly=trueextractFormat=text, I see the
extracted
 text
 
  clearly in addition to the metadata.
 
  I tested on 4.3, and then tested on Solr 3.6.1 and it also
exhibited the
  problem. I just see spaces in both cases.
 
  But whether the problem is due to Solr or Tika, is not apparent.
 
  In any case, a Jira is warranted.
 
 
  -- Jack Krupansky
 
  -Original Message- From: Sebastián Ramírez
  Sent: Friday, May 10, 2013 11:24 AM
  To: solr-user@lucene.apache.org
  Subject: Tika not extracting content from ODT / ODS (open document
/
  libreoffice) in Solr 4.2.1
 
  Hello everyone,
 
  I'm having a problem indexing content from opendocument format
files.
  The
  files created with OpenOffice and LibreOffice (odt, ods...).
 
  Tika is being able to read the files but Solr is not indexing the
 content.
 
  It's not a problem of commiting or something like that, after I
post a
  file
  it is indexed and all the metadata is indexed/stored but the
content
 isn't
  there.
 
 
- I modified the solrconfig.xml file to catch everything:
 
 
  requestHandler name=/update/extract...
 
 !-- here is the interesting part --
 
 !-- str name=uprefixignored_/str --
 str name=defaultFieldall_txt/str
 
 
 
 
- Then I submitted the file to Solr:
 
 
  curl '
  http://localhost:8983/solr/update/extract?commit=true**
 http://localhost:8983/solr/**update/extract?commit=true**
  literal.id=newodshttp://**localhost:8983/solr/update/**
  extract?commit=trueliteral.**id=newods

http://localhost:8983/solr/update/extract?commit=trueliteral.id=newods
  '
  -H
  'Content-type:
application/vnd.oasis.opendocument.spreadsheet'
 
  --data-binary @test_ods.ods
 
 
 
- Now when I do a search in Solr I get this result, there is
something
 
in the content, but that's not the actual content of the
original
  file:
 
  result name=response numFound=1 start=0
   doc
 str 

facet.method=fcs vs facet.method=fc on solr slaves

2013-12-04 Thread Patrick O'Lone
Is there any advantage on a Solr slave to receive queries using
facet.method=fcs instead of the default of facet.method=fc? Most of the
segment files are unchanged between replication events - but I wasn't
sure if replication would cause the unchanged segment field caches to be
lost anyway.
-- 
Patrick O'Lone
Director of Software Development
TownNews.com

E-mail ... pol...@townnews.com
Phone  309-743-0809
Fax .. 309-743-0830


Re: a core for every user, lots of users... are there issues

2013-12-04 Thread hank williams
Super helpful. Thanks.


On Wed, Dec 4, 2013 at 2:53 PM, Shawn Heisey s...@elyograg.org wrote:

 On 12/4/2013 12:34 PM, hank williams wrote:

 Ok one more simple question. We just upgraded to 4.6 from 4.2. In 4.2 we
 were *trying* to use the rest API function create to create cores
 without
 having to manually mess with files on the server. Is this what create
 was
 supposed to do? If so it was borken or we werent using it right. In any
 case in 4.6 is that the right way to programmatically add cores in
 discovery mode?


 If you are NOT in SolrCloud mode, in order to create new cores, the config
 files need to already exist on the disk.  This is the case with all
 versions of Solr.

 If you're running in SolrCloud mode, the core is associated with a
 collection.  Collections have a link to aconfig in zookeeper.  The config
 is not stored with the core on the disk.

 Thanks,
 Shawn




-- 
blog: whydoeseverythingsuck.com


Re: Solr Stalls on Bulk indexing, no logs or errors

2013-12-04 Thread steven crichton
Yes I can continue to query after this importer goes down and whilst it running.

The bulk commit is done via a JSON handler in php. There is 121,000 records 
that need to go into the index. So this is done in 5000 chunked mySQL retrieve 
calls and parsing to the data as required. 

workflow:

get record
create {add doc… } JSON
Post to CORE/update/json


I stopped doing a hard commit every 1000 records. To see if that was an issue.


the auto commit settings are ::

autoCommit
  maxDocs${solr.autoCommit.MaxDocs:5000}/maxDocs
  maxTime${solr.autoCommit.MaxTime:24000}/maxTime
/autoCommit


I’ve pretty much worked out of the drupal schemas for SOLR 4
https://drupal.org/project/apachesolr

At one point I thought it could be malformed data, but even reducing the 
records down to just the id and title now .. it crashes at the same point. As 
in the query still works but the import handler does nothing at all


Tomcat logs seem to indicate no major issues.


There’s not a strange variable that is set to make an upper index limit is 
there?

Regards,
Steven



On 4 Dec 2013, at 20:02, Erick Erickson [via Lucene] 
ml-node+s472066n4104984...@n3.nabble.com wrote:

 There's a known issue with SolrCloud with multiple shards, but 
 you haven't told us whether you're using that. The test for 
 whether you're running in to that is whether you can continue 
 to _query_, just not update. 
 
 But you need to tell us more about our setup. In particular 
 hour commit settings (hard and soft), your solrconfig settings, 
 particularly around autowarming, how you're bulk indexing, 
 SolrJ? DIH? a huge CSV file? 
 
 Best, 
 Erick 
 
 
 On Wed, Dec 4, 2013 at 2:30 PM, steven crichton [hidden email]wrote: 
 
  I am finding with a bulk index using SOLR 4.3 on Tomcat, that when I reach 
  69578 records the server stops adding anything more. 
  
  I've tried reducing the data sent to the bare minimum of fields and using 
  ASC and DESC data to see if it could be a field issue. 
  
  Is there anything I could look at for this? As I'm not finding anything 
  similar noted before. Does tomcat have issues with closing connections that 
  look like DDOS attacks? Or could it be related to too many commits in too 
  short a time? 
  
  Any help will be very greatly appreciated. 
  
  
  
  -- 
  View this message in context: 
  http://lucene.472066.n3.nabble.com/Solr-Stalls-on-Bulk-indexing-no-logs-or-errors-tp4104981.html
  Sent from the Solr - User mailing list archive at Nabble.com. 
  
 
 
 If you reply to this email, your message will be added to the discussion 
 below:
 http://lucene.472066.n3.nabble.com/Solr-Stalls-on-Bulk-indexing-no-logs-or-errors-tp4104981p4104984.html
 To unsubscribe from Solr Stalls on Bulk indexing, no logs or errors, click 
 here.
 NAML





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Stalls-on-Bulk-indexing-no-logs-or-errors-tp4104981p4104990.html
Sent from the Solr - User mailing list archive at Nabble.com.

Querying for results

2013-12-04 Thread Rob Veliz
Hello,

I am running Solr from Magento and using DIH to import/index data from 1
other source (external).  I am trying to query for results...two questions:

1. The query I'm using runs against fulltext_1_en which is a specific
shard created by the Magento deployment in solrconfig.xml.  Should I be
using/querying from another field/store (e.g. not fulltext_1*) to get
results from both Magento and the other data source?  How would I add the
data from my DIH indexing to that specific shard so it was all in the same
place?

2. OR do I need to add another shard to correspond to the DIH data elements?

3. OR is there something else I'm missing in trying to query for data from
2 sources?

Thanks!


starting up solr automatically

2013-12-04 Thread Eric Palmer
Hey all,

I'm pretty new to solr.  I'm installing it on an amazon linux (rpm based)
 ec2 instance and have it running. I even have nutch feeding it pages from
a crawl. I'm very happy about that.

I want solr to start on a reboot and am following the instructions at
http://wiki.apache.org/solr/SolrJetty#Starting

I'm using solr 4.5.1 and when I check the jetty version I get this

java -jar start.jar --version
Active Options: [default, *]
Version Information on 17 entries in the classpath.
Note: order presented here is how they would appear on the classpath.
  changes to the OPTIONS=[option,option,...] command line option will
be reflected here.
 0:(dir) | ${jetty.home}/resources
 1: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-xml-8.1.10.v20130312.jar
 2:  3.0.0.v201112011016 | ${jetty.home}/lib/servlet-api-3.0.jar
 3: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-http-8.1.10.v20130312.jar
 4: 8.1.10.v20130312 |
${jetty.home}/lib/jetty-continuation-8.1.10.v20130312.jar
 5: 8.1.10.v20130312 |
${jetty.home}/lib/jetty-server-8.1.10.v20130312.jar
 6: 8.1.10.v20130312 |
${jetty.home}/lib/jetty-security-8.1.10.v20130312.jar
 7: 8.1.10.v20130312 |
${jetty.home}/lib/jetty-servlet-8.1.10.v20130312.jar
 8: 8.1.10.v20130312 |
${jetty.home}/lib/jetty-webapp-8.1.10.v20130312.jar
 9: 8.1.10.v20130312 |
${jetty.home}/lib/jetty-deploy-8.1.10.v20130312.jar
10:1.6.6 | ${jetty.home}/lib/ext/jcl-over-slf4j-1.6.6.jar
11:1.6.6 | ${jetty.home}/lib/ext/jul-to-slf4j-1.6.6.jar
12:   1.2.16 | ${jetty.home}/lib/ext/log4j-1.2.16.jar
13:1.6.6 | ${jetty.home}/lib/ext/slf4j-api-1.6.6.jar
14:1.6.6 | ${jetty.home}/lib/ext/slf4j-log4j12-1.6.6.jar
15: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-util-8.1.10.v20130312.jar
16: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-io-8.1.10.v20130312.jar

the instructions reference a jetty.sh script for version 6 and a different
one for 7. Does the version 7 one work with jetty 8? If not where can I get
the one for version 8?

BTW - this is just the standard install of solr from the gzip file.

thanks in advance for your help.

-- 
Eric Palmer
U of Richmond


Re: starting up solr automatically

2013-12-04 Thread Greg Walters
I found the instructions and scripts on that page to be unclear and/or not 
work. Here's the script I've been using for solr 4.5.1: 
https://gist.github.com/gregwalters/7795791 Do note that you'll have to change 
a couple of paths to get things working correctly.

Thanks,
Greg

On Dec 4, 2013, at 3:15 PM, Eric Palmer e...@ericfpalmer.com wrote:

 Hey all,
 
 I'm pretty new to solr.  I'm installing it on an amazon linux (rpm based)
 ec2 instance and have it running. I even have nutch feeding it pages from
 a crawl. I'm very happy about that.
 
 I want solr to start on a reboot and am following the instructions at
 http://wiki.apache.org/solr/SolrJetty#Starting
 
 I'm using solr 4.5.1 and when I check the jetty version I get this
 
 java -jar start.jar --version
 Active Options: [default, *]
 Version Information on 17 entries in the classpath.
 Note: order presented here is how they would appear on the classpath.
  changes to the OPTIONS=[option,option,...] command line option will
 be reflected here.
 0:(dir) | ${jetty.home}/resources
 1: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-xml-8.1.10.v20130312.jar
 2:  3.0.0.v201112011016 | ${jetty.home}/lib/servlet-api-3.0.jar
 3: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-http-8.1.10.v20130312.jar
 4: 8.1.10.v20130312 |
 ${jetty.home}/lib/jetty-continuation-8.1.10.v20130312.jar
 5: 8.1.10.v20130312 |
 ${jetty.home}/lib/jetty-server-8.1.10.v20130312.jar
 6: 8.1.10.v20130312 |
 ${jetty.home}/lib/jetty-security-8.1.10.v20130312.jar
 7: 8.1.10.v20130312 |
 ${jetty.home}/lib/jetty-servlet-8.1.10.v20130312.jar
 8: 8.1.10.v20130312 |
 ${jetty.home}/lib/jetty-webapp-8.1.10.v20130312.jar
 9: 8.1.10.v20130312 |
 ${jetty.home}/lib/jetty-deploy-8.1.10.v20130312.jar
 10:1.6.6 | ${jetty.home}/lib/ext/jcl-over-slf4j-1.6.6.jar
 11:1.6.6 | ${jetty.home}/lib/ext/jul-to-slf4j-1.6.6.jar
 12:   1.2.16 | ${jetty.home}/lib/ext/log4j-1.2.16.jar
 13:1.6.6 | ${jetty.home}/lib/ext/slf4j-api-1.6.6.jar
 14:1.6.6 | ${jetty.home}/lib/ext/slf4j-log4j12-1.6.6.jar
 15: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-util-8.1.10.v20130312.jar
 16: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-io-8.1.10.v20130312.jar
 
 the instructions reference a jetty.sh script for version 6 and a different
 one for 7. Does the version 7 one work with jetty 8? If not where can I get
 the one for version 8?
 
 BTW - this is just the standard install of solr from the gzip file.
 
 thanks in advance for your help.
 
 -- 
 Eric Palmer
 U of Richmond



Re: Querying for results

2013-12-04 Thread Rob Veliz
Follow-up: Would anyone very familiar with DIH be willing to jump on a side
thread with me and my developer to help troubleshoot some issues we're
having?  Please little r me at: robert [at] mavenbridge.com.  Thanks!




On Wed, Dec 4, 2013 at 1:14 PM, Rob Veliz rob...@mavenbridge.com wrote:

 Hello,

 I am running Solr from Magento and using DIH to import/index data from 1
 other source (external).  I am trying to query for results...two questions:

 1. The query I'm using runs against fulltext_1_en which is a specific
 shard created by the Magento deployment in solrconfig.xml.  Should I be
 using/querying from another field/store (e.g. not fulltext_1*) to get
 results from both Magento and the other data source?  How would I add the
 data from my DIH indexing to that specific shard so it was all in the same
 place?

 2. OR do I need to add another shard to correspond to the DIH data
 elements?

 3. OR is there something else I'm missing in trying to query for data from
 2 sources?

 Thanks!





-- 
*Rob Veliz*, Founder | *Mavenbridge* | rob...@mavenbridge.com | M: +1 (206)
909 - 3490

Follow us at: http://twitter.com/mavenbridge


Re: starting up solr automatically

2013-12-04 Thread Greg Walters
I almost forgot, you'll need a file to setup the environment a bit too:

**
JAVA_HOME=/usr/java/default
JAVA_OPTIONS=-Xmx15g \
-Xms15g \
-XX:+PrintGCApplicationStoppedTime \
-XX:+PrintGCDateStamps \
-XX:+PrintGCDetails \
-XX:+UseConcMarkSweepGC \
-XX:+UseParNewGC \
-XX:+UseTLAB \
-XX:+CMSParallelRemarkEnabled \
-XX:+CMSScavengeBeforeRemark \
-XX:+UseCMSInitiatingOccupancyOnly \
-XX:CMSInitiatingOccupancyFraction=50 \
-XX:CMSWaitDuration=30 \
-XX:GCTimeRatio=40 \
-Xloggc:/tmp/solr45_gc.log \
-Dbootstrap_conf=true \
-Dbootstrap_confdir=/var/lib/answers/atlascloud/solr45/solr/wa-en-collection_1/conf/
 \
-Dcollection.configName=wa-en-collection \
-DzkHost=hosts \
-DnumShards=shards \
-Dsolr.solr.home=/var/lib/answers/atlascloud/solr45/solr/ \
-Dlog4j.configuration=file:///var/lib/answers/atlascloud/solr45/resources/log4j.properties
 \
-Djetty.port=9101 \
$JAVA_OPTIONS
JETTY_HOME=/var/lib/answers/atlascloud/solr45/
JETTY_USER=tomcat
JETTY_LOGS=/var/lib/answers/atlascloud/solr45/logs
**

On Dec 4, 2013, at 3:21 PM, Greg Walters greg.walt...@answers.com wrote:

 I found the instructions and scripts on that page to be unclear and/or not 
 work. Here's the script I've been using for solr 4.5.1: 
 https://gist.github.com/gregwalters/7795791 Do note that you'll have to 
 change a couple of paths to get things working correctly.
 
 Thanks,
 Greg
 
 On Dec 4, 2013, at 3:15 PM, Eric Palmer e...@ericfpalmer.com wrote:
 
 Hey all,
 
 I'm pretty new to solr.  I'm installing it on an amazon linux (rpm based)
 ec2 instance and have it running. I even have nutch feeding it pages from
 a crawl. I'm very happy about that.
 
 I want solr to start on a reboot and am following the instructions at
 http://wiki.apache.org/solr/SolrJetty#Starting
 
 I'm using solr 4.5.1 and when I check the jetty version I get this
 
 java -jar start.jar --version
 Active Options: [default, *]
 Version Information on 17 entries in the classpath.
 Note: order presented here is how they would appear on the classpath.
 changes to the OPTIONS=[option,option,...] command line option will
 be reflected here.
 0:(dir) | ${jetty.home}/resources
 1: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-xml-8.1.10.v20130312.jar
 2:  3.0.0.v201112011016 | ${jetty.home}/lib/servlet-api-3.0.jar
 3: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-http-8.1.10.v20130312.jar
 4: 8.1.10.v20130312 |
 ${jetty.home}/lib/jetty-continuation-8.1.10.v20130312.jar
 5: 8.1.10.v20130312 |
 ${jetty.home}/lib/jetty-server-8.1.10.v20130312.jar
 6: 8.1.10.v20130312 |
 ${jetty.home}/lib/jetty-security-8.1.10.v20130312.jar
 7: 8.1.10.v20130312 |
 ${jetty.home}/lib/jetty-servlet-8.1.10.v20130312.jar
 8: 8.1.10.v20130312 |
 ${jetty.home}/lib/jetty-webapp-8.1.10.v20130312.jar
 9: 8.1.10.v20130312 |
 ${jetty.home}/lib/jetty-deploy-8.1.10.v20130312.jar
 10:1.6.6 | ${jetty.home}/lib/ext/jcl-over-slf4j-1.6.6.jar
 11:1.6.6 | ${jetty.home}/lib/ext/jul-to-slf4j-1.6.6.jar
 12:   1.2.16 | ${jetty.home}/lib/ext/log4j-1.2.16.jar
 13:1.6.6 | ${jetty.home}/lib/ext/slf4j-api-1.6.6.jar
 14:1.6.6 | ${jetty.home}/lib/ext/slf4j-log4j12-1.6.6.jar
 15: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-util-8.1.10.v20130312.jar
 16: 8.1.10.v20130312 | ${jetty.home}/lib/jetty-io-8.1.10.v20130312.jar
 
 the instructions reference a jetty.sh script for version 6 and a different
 one for 7. Does the version 7 one work with jetty 8? If not where can I get
 the one for version 8?
 
 BTW - this is just the standard install of solr from the gzip file.
 
 thanks in advance for your help.
 
 -- 
 Eric Palmer
 U of Richmond
 



Re: starting up solr automatically

2013-12-04 Thread Eric Palmer
thanks greg

I got it starting but the collection file is not avail. I will use the
script that you gave the url for and the env settings. Thanks


On Wed, Dec 4, 2013 at 4:26 PM, Greg Walters greg.walt...@answers.comwrote:

 I almost forgot, you'll need a file to setup the environment a bit too:

 **
 JAVA_HOME=/usr/java/default
 JAVA_OPTIONS=-Xmx15g \
 -Xms15g \
 -XX:+PrintGCApplicationStoppedTime \
 -XX:+PrintGCDateStamps \
 -XX:+PrintGCDetails \
 -XX:+UseConcMarkSweepGC \
 -XX:+UseParNewGC \
 -XX:+UseTLAB \
 -XX:+CMSParallelRemarkEnabled \
 -XX:+CMSScavengeBeforeRemark \
 -XX:+UseCMSInitiatingOccupancyOnly \
 -XX:CMSInitiatingOccupancyFraction=50 \
 -XX:CMSWaitDuration=30 \
 -XX:GCTimeRatio=40 \
 -Xloggc:/tmp/solr45_gc.log \
 -Dbootstrap_conf=true \
 -Dbootstrap_confdir=/var/lib/answers/atlascloud/solr45/solr/wa-en-collection_1/conf/
 \
 -Dcollection.configName=wa-en-collection \
 -DzkHost=hosts \
 -DnumShards=shards \
 -Dsolr.solr.home=/var/lib/answers/atlascloud/solr45/solr/ \
 -Dlog4j.configuration=file:///var/lib/answers/atlascloud/solr45/resources/log4j.properties
 \
 -Djetty.port=9101 \
 $JAVA_OPTIONS
 JETTY_HOME=/var/lib/answers/atlascloud/solr45/
 JETTY_USER=tomcat
 JETTY_LOGS=/var/lib/answers/atlascloud/solr45/logs
 **

 On Dec 4, 2013, at 3:21 PM, Greg Walters greg.walt...@answers.com wrote:

  I found the instructions and scripts on that page to be unclear and/or
 not work. Here's the script I've been using for solr 4.5.1:
 https://gist.github.com/gregwalters/7795791 Do note that you'll have to
 change a couple of paths to get things working correctly.
 
  Thanks,
  Greg
 
  On Dec 4, 2013, at 3:15 PM, Eric Palmer e...@ericfpalmer.com wrote:
 
  Hey all,
 
  I'm pretty new to solr.  I'm installing it on an amazon linux (rpm
 based)
  ec2 instance and have it running. I even have nutch feeding it pages
 from
  a crawl. I'm very happy about that.
 
  I want solr to start on a reboot and am following the instructions at
  http://wiki.apache.org/solr/SolrJetty#Starting
 
  I'm using solr 4.5.1 and when I check the jetty version I get this
 
  java -jar start.jar --version
  Active Options: [default, *]
  Version Information on 17 entries in the classpath.
  Note: order presented here is how they would appear on the classpath.
  changes to the OPTIONS=[option,option,...] command line option will
  be reflected here.
  0:(dir) | ${jetty.home}/resources
  1: 8.1.10.v20130312 |
 ${jetty.home}/lib/jetty-xml-8.1.10.v20130312.jar
  2:  3.0.0.v201112011016 | ${jetty.home}/lib/servlet-api-3.0.jar
  3: 8.1.10.v20130312 |
 ${jetty.home}/lib/jetty-http-8.1.10.v20130312.jar
  4: 8.1.10.v20130312 |
  ${jetty.home}/lib/jetty-continuation-8.1.10.v20130312.jar
  5: 8.1.10.v20130312 |
  ${jetty.home}/lib/jetty-server-8.1.10.v20130312.jar
  6: 8.1.10.v20130312 |
  ${jetty.home}/lib/jetty-security-8.1.10.v20130312.jar
  7: 8.1.10.v20130312 |
  ${jetty.home}/lib/jetty-servlet-8.1.10.v20130312.jar
  8: 8.1.10.v20130312 |
  ${jetty.home}/lib/jetty-webapp-8.1.10.v20130312.jar
  9: 8.1.10.v20130312 |
  ${jetty.home}/lib/jetty-deploy-8.1.10.v20130312.jar
  10:1.6.6 |
 ${jetty.home}/lib/ext/jcl-over-slf4j-1.6.6.jar
  11:1.6.6 | ${jetty.home}/lib/ext/jul-to-slf4j-1.6.6.jar
  12:   1.2.16 | ${jetty.home}/lib/ext/log4j-1.2.16.jar
  13:1.6.6 | ${jetty.home}/lib/ext/slf4j-api-1.6.6.jar
  14:1.6.6 | ${jetty.home}/lib/ext/slf4j-log4j12-1.6.6.jar
  15: 8.1.10.v20130312 |
 ${jetty.home}/lib/jetty-util-8.1.10.v20130312.jar
  16: 8.1.10.v20130312 |
 ${jetty.home}/lib/jetty-io-8.1.10.v20130312.jar
 
  the instructions reference a jetty.sh script for version 6 and a
 different
  one for 7. Does the version 7 one work with jetty 8? If not where can I
 get
  the one for version 8?
 
  BTW - this is just the standard install of solr from the gzip file.
 
  thanks in advance for your help.
 
  --
  Eric Palmer
  U of Richmond
 




-- 
Eric Palmer


Re: Solr Stalls on Bulk indexing, no logs or errors

2013-12-04 Thread Erick Erickson
Wait, crashes? Or just stops accepting updates?

At any rate, this should be fixed in 4.6. If you
can dump a stack trace, we can identify whether this
is the same issue quickly. jstack is popular.

If you're still having queries served, it's probably
not your commit settings, try searching the
JIRA list for distributed deadlock. You should
find two JIRAs, one relevant to SolrJ by Joel
Bernstein (probably not one you are about) and
one by Mark Miller that address this.

Best,
Erick


On Wed, Dec 4, 2013 at 3:19 PM, steven crichton stevencrich...@mac.comwrote:

 Yes I can continue to query after this importer goes down and whilst it
 running.

 The bulk commit is done via a JSON handler in php. There is 121,000
 records that need to go into the index. So this is done in 5000 chunked
 mySQL retrieve calls and parsing to the data as required.

 workflow:

 get record
 create {add doc… } JSON
 Post to CORE/update/json


 I stopped doing a hard commit every 1000 records. To see if that was an
 issue.


 the auto commit settings are ::

 autoCommit
   maxDocs${solr.autoCommit.MaxDocs:5000}/maxDocs
   maxTime${solr.autoCommit.MaxTime:24000}/maxTime
 /autoCommit


 I’ve pretty much worked out of the drupal schemas for SOLR 4
 https://drupal.org/project/apachesolr

 At one point I thought it could be malformed data, but even reducing the
 records down to just the id and title now .. it crashes at the same point.
 As in the query still works but the import handler does nothing at all


 Tomcat logs seem to indicate no major issues.


 There’s not a strange variable that is set to make an upper index limit is
 there?

 Regards,
 Steven



 On 4 Dec 2013, at 20:02, Erick Erickson [via Lucene] 
 ml-node+s472066n4104984...@n3.nabble.com wrote:

  There's a known issue with SolrCloud with multiple shards, but
  you haven't told us whether you're using that. The test for
  whether you're running in to that is whether you can continue
  to _query_, just not update.
 
  But you need to tell us more about our setup. In particular
  hour commit settings (hard and soft), your solrconfig settings,
  particularly around autowarming, how you're bulk indexing,
  SolrJ? DIH? a huge CSV file?
 
  Best,
  Erick
 
 
  On Wed, Dec 4, 2013 at 2:30 PM, steven crichton [hidden email]wrote:
 
   I am finding with a bulk index using SOLR 4.3 on Tomcat, that when I
 reach
   69578 records the server stops adding anything more.
  
   I've tried reducing the data sent to the bare minimum of fields and
 using
   ASC and DESC data to see if it could be a field issue.
  
   Is there anything I could look at for this? As I'm not finding anything
   similar noted before. Does tomcat have issues with closing connections
 that
   look like DDOS attacks? Or could it be related to too many commits in
 too
   short a time?
  
   Any help will be very greatly appreciated.
  
  
  
   --
   View this message in context:
  
 http://lucene.472066.n3.nabble.com/Solr-Stalls-on-Bulk-indexing-no-logs-or-errors-tp4104981.html
   Sent from the Solr - User mailing list archive at Nabble.com.
  
 
 
  If you reply to this email, your message will be added to the discussion
 below:
 
 http://lucene.472066.n3.nabble.com/Solr-Stalls-on-Bulk-indexing-no-logs-or-errors-tp4104981p4104984.html
  To unsubscribe from Solr Stalls on Bulk indexing, no logs or errors,
 click here.
  NAML





 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-Stalls-on-Bulk-indexing-no-logs-or-errors-tp4104981p4104990.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Shouldn't fuzzy version of a solr query always return a super set of its not-fuzzy equivalent

2013-12-04 Thread Jack Krupansky
Ah... although the lower case filtering does get applied properly in a 
multiterm analysis scenario, stemming does not. What stemmer are you 
using? I suspect that swimming normally becomes swim. Compare the debug 
output of the two queries.


-- Jack Krupansky

-Original Message- 
From: Mhd Wrk

Sent: Wednesday, December 04, 2013 2:08 PM
To: solr-user@lucene.apache.org
Subject: Re: Shouldn't fuzzy version of a solr query always return a super 
set of its not-fuzzy equivalent


Debug shows that all terms are lowercased properly.

Thanks
On Dec 4, 2013 3:18 AM, Erik Hatcher erik.hatc...@gmail.com wrote:


Chances are you're not getting those fuzzy terms analyzed as you'd like.
 See debug (debug=true) output to be sure.  Most likely the fuzzy terms
are not being lowercased.  See
http://wiki.apache.org/solr/MultitermQueryAnalysis for more details (this
applies to fuzzy, not just wildcard) terms too.

Erik


On Dec 4, 2013, at 4:46 AM, Mhd Wrk mhd...@gmail.com wrote:

 I'm using the following query to do a fuzzy search on Solr 4.5.1 and am
 getting empty result.

 qt=standardq=+(field1|en_CA|:Swimming~2 field1|en|:Swimming~2)
 +(field1|en_CA|:Goggle~1 field1|en|:Goggle~1) +(+startDate:[* TO
 2013-12-04T00:23:00Z] -endDate:[* TO
 2013-12-04T00:23:00Z])start=0rows=10fl=id

 If I change it to a not fuzzy query by simply dropping tildes from the
 terms (see below) then it returns the expected result! Is this a bug?
 Shouldn't fuzzy version of a query always return a super set of its
 not-fuzzy equivalent?

 qt=standardq=+(field1|en_CA|:Swimming field1|en|:Swimming)
 +(field1|en_CA|:Goggle field1|en|:Goggle) +(+startDate:[* TO
 2013-12-04T00:23:00Z] -endDate:[* TO
 2013-12-04T00:23:00Z])start=0rows=10fl=id






Re: a core for every user, lots of users... are there issues

2013-12-04 Thread Erick Erickson
Hank:

I should add that lots of cores and SolrCloud aren't guaranteed to play
nice together. I think some of the committers will be addressing this
sometime soon.

I'm not saying that this will certainly fail, OTOH I don't know anyone
who's combined the two.

Erick


On Wed, Dec 4, 2013 at 3:18 PM, hank williams hank...@gmail.com wrote:

 Super helpful. Thanks.


 On Wed, Dec 4, 2013 at 2:53 PM, Shawn Heisey s...@elyograg.org wrote:

  On 12/4/2013 12:34 PM, hank williams wrote:
 
  Ok one more simple question. We just upgraded to 4.6 from 4.2. In 4.2 we
  were *trying* to use the rest API function create to create cores
  without
  having to manually mess with files on the server. Is this what create
  was
  supposed to do? If so it was borken or we werent using it right. In any
  case in 4.6 is that the right way to programmatically add cores in
  discovery mode?
 
 
  If you are NOT in SolrCloud mode, in order to create new cores, the
 config
  files need to already exist on the disk.  This is the case with all
  versions of Solr.
 
  If you're running in SolrCloud mode, the core is associated with a
  collection.  Collections have a link to aconfig in zookeeper.  The config
  is not stored with the core on the disk.
 
  Thanks,
  Shawn
 
 


 --
 blog: whydoeseverythingsuck.com



Re: a core for every user, lots of users... are there issues

2013-12-04 Thread hank williams
Oh my... when you say I don't know anyone who's combined the two. do you
mean that those that have tried have failed or that no one has gotten
around to trying? It sounds like you are saying you have some specific
knowledge that right now these wont work, otherwise you wouldnt say committers
will be addressing this sometime soon, right?

I'm worried as we need to make a practical decision here and it sounds like
maybe we should stick with solr for now... is that what you are saying?


On Wed, Dec 4, 2013 at 5:01 PM, Erick Erickson erickerick...@gmail.comwrote:

 Hank:

 I should add that lots of cores and SolrCloud aren't guaranteed to play
 nice together. I think some of the committers will be addressing this
 sometime soon.

 I'm not saying that this will certainly fail, OTOH I don't know anyone
 who's combined the two.

 Erick


 On Wed, Dec 4, 2013 at 3:18 PM, hank williams hank...@gmail.com wrote:

  Super helpful. Thanks.
 
 
  On Wed, Dec 4, 2013 at 2:53 PM, Shawn Heisey s...@elyograg.org wrote:
 
   On 12/4/2013 12:34 PM, hank williams wrote:
  
   Ok one more simple question. We just upgraded to 4.6 from 4.2. In 4.2
 we
   were *trying* to use the rest API function create to create cores
   without
   having to manually mess with files on the server. Is this what
 create
   was
   supposed to do? If so it was borken or we werent using it right. In
 any
   case in 4.6 is that the right way to programmatically add cores in
   discovery mode?
  
  
   If you are NOT in SolrCloud mode, in order to create new cores, the
  config
   files need to already exist on the disk.  This is the case with all
   versions of Solr.
  
   If you're running in SolrCloud mode, the core is associated with a
   collection.  Collections have a link to aconfig in zookeeper.  The
 config
   is not stored with the core on the disk.
  
   Thanks,
   Shawn
  
  
 
 
  --
  blog: whydoeseverythingsuck.com
 




-- 
blog: whydoeseverythingsuck.com


Inconsistent numFound in SC when querying core directly

2013-12-04 Thread Tim Vaillancourt

Hey guys,

I'm looking into a strange issue on an unhealthy 4.3.1 SolrCloud with 
3-node external Zookeeper and 1 collection (2 shards, 2 replicas).


Currently we are noticing inconsistent results from the SolrCloud when 
performing the same simple /select query many times to our collection. 
Almost every other query the numFound count (and the returned data) 
jumps between two very different values.


Initially I suspected a replica in a shard of the collection was 
inconsistent (and every other request hit that node) and started 
performing the same /select query direct to the individual cores of the 
SolrCloud collection on each instance, only to notice the same problem - 
the count jumps between two very different values!


I may be incorrect here, but I assumed when querying a single core of a 
SolrCloud collection, the SolrCloud routing is bypassed and I am talking 
directly to a plain/non-SolrCloud core.


As you can see here, the count for 1 core of my SolrCloud collection 
fluctuates wildly, and is only receiving updates and no deletes to 
explain the jumps:


solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep 
numFound

  response:{numFound:123596839,start:0,maxScore:1.0,docs:[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep 
numFound

  response:{numFound:84739144,start:0,maxScore:1.0,docs:[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep 
numFound

  response:{numFound:123596839,start:0,maxScore:1.0,docs:[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep 
numFound

  response:{numFound:84771358,start:0,maxScore:1.0,docs:[]


Could anyone help me understand why the same /select query direct to a 
single core would return inconsistent, flapping results if there are no 
deletes issued in my app to cause such jumps? Am I incorrect in my 
assumption that I am querying the core directly?


An interesting observation is when I do an /admin/cores call to see the 
docCount of the core's index, it does not fluctuate, only the query result.


That was hard to explain, hopefully someone has some insight! :)

Thanks!

Tim


Re: Shouldn't fuzzy version of a solr query always return a super set of its not-fuzzy equivalent

2013-12-04 Thread Mhd Wrk
I'm using snowball stemmer and, you are correct, swimming has been stored
as swim.

Should I wrap snowball filter in a multiterm analyzer?

Thanks
 On Dec 4, 2013 2:02 PM, Jack Krupansky j...@basetechnology.com wrote:

 Ah... although the lower case filtering does get applied properly in a
 multiterm analysis scenario, stemming does not. What stemmer are you
 using? I suspect that swimming normally becomes swim. Compare the debug
 output of the two queries.

 -- Jack Krupansky

 -Original Message- From: Mhd Wrk
 Sent: Wednesday, December 04, 2013 2:08 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Shouldn't fuzzy version of a solr query always return a super
 set of its not-fuzzy equivalent

 Debug shows that all terms are lowercased properly.

 Thanks
 On Dec 4, 2013 3:18 AM, Erik Hatcher erik.hatc...@gmail.com wrote:

  Chances are you're not getting those fuzzy terms analyzed as you'd like.
  See debug (debug=true) output to be sure.  Most likely the fuzzy terms
 are not being lowercased.  See
 http://wiki.apache.org/solr/MultitermQueryAnalysis for more details (this
 applies to fuzzy, not just wildcard) terms too.

 Erik


 On Dec 4, 2013, at 4:46 AM, Mhd Wrk mhd...@gmail.com wrote:

  I'm using the following query to do a fuzzy search on Solr 4.5.1 and am
  getting empty result.
 
  qt=standardq=+(field1|en_CA|:Swimming~2 field1|en|:Swimming~2)
  +(field1|en_CA|:Goggle~1 field1|en|:Goggle~1) +(+startDate:[* TO
  2013-12-04T00:23:00Z] -endDate:[* TO
  2013-12-04T00:23:00Z])start=0rows=10fl=id
 
  If I change it to a not fuzzy query by simply dropping tildes from the
  terms (see below) then it returns the expected result! Is this a bug?
  Shouldn't fuzzy version of a query always return a super set of its
  not-fuzzy equivalent?
 
  qt=standardq=+(field1|en_CA|:Swimming field1|en|:Swimming)
  +(field1|en_CA|:Goggle field1|en|:Goggle) +(+startDate:[* TO
  2013-12-04T00:23:00Z] -endDate:[* TO
  2013-12-04T00:23:00Z])start=0rows=10fl=id






Re: Inconsistent numFound in SC when querying core directly

2013-12-04 Thread Tim Vaillancourt

To add two more pieces of data:

1) This occurs with real, conditional queries as well (eg: 
q=key:timvaillancourt), not just the q=*:* I provided in my email.
2) I've noticed when I bring a node of the SolrCloud down it is 
remaining state: active in my /clusterstate.json - something is really 
wrong with this cloud! Would a Zookeeper issue explain my varied results 
when querying a core directly?


Thanks again!

Tim

On 04/12/13 02:17 PM, Tim Vaillancourt wrote:

Hey guys,

I'm looking into a strange issue on an unhealthy 4.3.1 SolrCloud with 
3-node external Zookeeper and 1 collection (2 shards, 2 replicas).


Currently we are noticing inconsistent results from the SolrCloud when 
performing the same simple /select query many times to our collection. 
Almost every other query the numFound count (and the returned data) 
jumps between two very different values.


Initially I suspected a replica in a shard of the collection was 
inconsistent (and every other request hit that node) and started 
performing the same /select query direct to the individual cores of 
the SolrCloud collection on each instance, only to notice the same 
problem - the count jumps between two very different values!


I may be incorrect here, but I assumed when querying a single core of 
a SolrCloud collection, the SolrCloud routing is bypassed and I am 
talking directly to a plain/non-SolrCloud core.


As you can see here, the count for 1 core of my SolrCloud collection 
fluctuates wildly, and is only receiving updates and no deletes to 
explain the jumps:


solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep 
numFound

  response:{numFound:123596839,start:0,maxScore:1.0,docs:[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep 
numFound

  response:{numFound:84739144,start:0,maxScore:1.0,docs:[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep 
numFound

  response:{numFound:123596839,start:0,maxScore:1.0,docs:[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep 
numFound

  response:{numFound:84771358,start:0,maxScore:1.0,docs:[]


Could anyone help me understand why the same /select query direct to a 
single core would return inconsistent, flapping results if there are 
no deletes issued in my app to cause such jumps? Am I incorrect in my 
assumption that I am querying the core directly?


An interesting observation is when I do an /admin/cores call to see 
the docCount of the core's index, it does not fluctuate, only the 
query result.


That was hard to explain, hopefully someone has some insight! :)

Thanks!

Tim


RE: Inconsistent numFound in SC when querying core directly

2013-12-04 Thread Markus Jelsma
https://issues.apache.org/jira/browse/SOLR-4260

Join the club Tim! Can you upgrade to trunk or incorporate the latest patches 
of related issues? You can fix it by trashing the bad node's data, although 
without multiple clusters it may be difficult to decide which node is bad.

We use the latest commits now (since tuesday) and are still waiting for it to 
happen again.

-Original message-
 From:Tim Vaillancourt t...@elementspace.com
 Sent: Wednesday 4th December 2013 23:38
 To: solr-user@lucene.apache.org
 Subject: Re: Inconsistent numFound in SC when querying core directly
 
 To add two more pieces of data:
 
 1) This occurs with real, conditional queries as well (eg: 
 q=key:timvaillancourt), not just the q=*:* I provided in my email.
 2) I've noticed when I bring a node of the SolrCloud down it is 
 remaining state: active in my /clusterstate.json - something is really 
 wrong with this cloud! Would a Zookeeper issue explain my varied results 
 when querying a core directly?
 
 Thanks again!
 
 Tim
 
 On 04/12/13 02:17 PM, Tim Vaillancourt wrote:
  Hey guys,
 
  I'm looking into a strange issue on an unhealthy 4.3.1 SolrCloud with 
  3-node external Zookeeper and 1 collection (2 shards, 2 replicas).
 
  Currently we are noticing inconsistent results from the SolrCloud when 
  performing the same simple /select query many times to our collection. 
  Almost every other query the numFound count (and the returned data) 
  jumps between two very different values.
 
  Initially I suspected a replica in a shard of the collection was 
  inconsistent (and every other request hit that node) and started 
  performing the same /select query direct to the individual cores of 
  the SolrCloud collection on each instance, only to notice the same 
  problem - the count jumps between two very different values!
 
  I may be incorrect here, but I assumed when querying a single core of 
  a SolrCloud collection, the SolrCloud routing is bypassed and I am 
  talking directly to a plain/non-SolrCloud core.
 
  As you can see here, the count for 1 core of my SolrCloud collection 
  fluctuates wildly, and is only receiving updates and no deletes to 
  explain the jumps:
 
  solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
  'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep
   
  numFound
response:{numFound:123596839,start:0,maxScore:1.0,docs:[]
 
  solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
  'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep
   
  numFound
response:{numFound:84739144,start:0,maxScore:1.0,docs:[]
 
  solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
  'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep
   
  numFound
response:{numFound:123596839,start:0,maxScore:1.0,docs:[]
 
  solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
  'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep
   
  numFound
response:{numFound:84771358,start:0,maxScore:1.0,docs:[]
 
 
  Could anyone help me understand why the same /select query direct to a 
  single core would return inconsistent, flapping results if there are 
  no deletes issued in my app to cause such jumps? Am I incorrect in my 
  assumption that I am querying the core directly?
 
  An interesting observation is when I do an /admin/cores call to see 
  the docCount of the core's index, it does not fluctuate, only the 
  query result.
 
  That was hard to explain, hopefully someone has some insight! :)
 
  Thanks!
 
  Tim
 


Re: Inconsistent numFound in SC when querying core directly

2013-12-04 Thread Tim Vaillancourt

Thanks Markus,

I'm not sure if I'm encountering the same issue. This JIRA mentions 10s 
of docs difference, I'm seeing differences in the multi-millions of 
docs, and even more strangely it very predictably flaps between a 123M 
value and an 87M value, a 30M+ doc difference.


Secondly, I'm not comparing values from 2 instances (Leader to Replica), 
I'm currently performing the same curl call to the same core directly 
and am seeing flapping results each time I perform the query, so this is 
currently happening within a single instance/core unless I am 
misunderstanding how to directly query a core.


Cheers,

Tim

On 04/12/13 02:46 PM, Markus Jelsma wrote:

https://issues.apache.org/jira/browse/SOLR-4260

Join the club Tim! Can you upgrade to trunk or incorporate the latest patches 
of related issues? You can fix it by trashing the bad node's data, although 
without multiple clusters it may be difficult to decide which node is bad.

We use the latest commits now (since tuesday) and are still waiting for it to 
happen again.

-Original message-

From:Tim Vaillancourtt...@elementspace.com
Sent: Wednesday 4th December 2013 23:38
To: solr-user@lucene.apache.org
Subject: Re: Inconsistent numFound in SC when querying core directly

To add two more pieces of data:

1) This occurs with real, conditional queries as well (eg:
q=key:timvaillancourt), not just the q=*:* I provided in my email.
2) I've noticed when I bring a node of the SolrCloud down it is
remaining state: active in my /clusterstate.json - something is really
wrong with this cloud! Would a Zookeeper issue explain my varied results
when querying a core directly?

Thanks again!

Tim

On 04/12/13 02:17 PM, Tim Vaillancourt wrote:

Hey guys,

I'm looking into a strange issue on an unhealthy 4.3.1 SolrCloud with
3-node external Zookeeper and 1 collection (2 shards, 2 replicas).

Currently we are noticing inconsistent results from the SolrCloud when
performing the same simple /select query many times to our collection.
Almost every other query the numFound count (and the returned data)
jumps between two very different values.

Initially I suspected a replica in a shard of the collection was
inconsistent (and every other request hit that node) and started
performing the same /select query direct to the individual cores of
the SolrCloud collection on each instance, only to notice the same
problem - the count jumps between two very different values!

I may be incorrect here, but I assumed when querying a single core of
a SolrCloud collection, the SolrCloud routing is bypassed and I am
talking directly to a plain/non-SolrCloud core.

As you can see here, the count for 1 core of my SolrCloud collection
fluctuates wildly, and is only receiving updates and no deletes to
explain the jumps:

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep
numFound
   response:{numFound:123596839,start:0,maxScore:1.0,docs:[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep
numFound
   response:{numFound:84739144,start:0,maxScore:1.0,docs:[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep
numFound
   response:{numFound:123596839,start:0,maxScore:1.0,docs:[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep
numFound
   response:{numFound:84771358,start:0,maxScore:1.0,docs:[]


Could anyone help me understand why the same /select query direct to a
single core would return inconsistent, flapping results if there are
no deletes issued in my app to cause such jumps? Am I incorrect in my
assumption that I am querying the core directly?

An interesting observation is when I do an /admin/cores call to see
the docCount of the core's index, it does not fluctuate, only the
query result.

That was hard to explain, hopefully someone has some insight! :)

Thanks!

Tim


Re: Inconsistent numFound in SC when querying core directly

2013-12-04 Thread Chris Hostetter
: 
: I may be incorrect here, but I assumed when querying a single core of a
: SolrCloud collection, the SolrCloud routing is bypassed and I am talking
: directly to a plain/non-SolrCloud core.

No ... every query received from a client by solr is handled by a single 
core -- if that core knows it's part of a SolrCloud collection then it 
will do a distributed search across a random replica from each shard in 
that collection.

If you want to bypass the distribute search logic, you have to say so 
explicitly...

To ask an arbitrary replica to only search itself add distrib=false to 
the request.

Alternatively: you can ask that only certain shard names (or certain 
explicit replicas) be included in a distribute request..

https://cwiki.apache.org/confluence/display/solr/Distributed+Requests



-Hoss
http://www.lucidworks.com/


Re: Inconsistent numFound in SC when querying core directly

2013-12-04 Thread Tim Vaillancourt
Chris, this is extremely helpful and it's silly I didn't think of this 
sooner! Thanks a lot, this makes the situation make much more sense.


I will gather some proper data with your suggestion and get back to the 
thread shortly.


Thanks!!

Tim

On 04/12/13 02:57 PM, Chris Hostetter wrote:

:
: I may be incorrect here, but I assumed when querying a single core of a
: SolrCloud collection, the SolrCloud routing is bypassed and I am talking
: directly to a plain/non-SolrCloud core.

No ... every query received from a client by solr is handled by a single
core -- if that core knows it's part of a SolrCloud collection then it
will do a distributed search across a random replica from each shard in
that collection.

If you want to bypass the distribute search logic, you have to say so
explicitly...

To ask an arbitrary replica to only search itself add distrib=false to
the request.

Alternatively: you can ask that only certain shard names (or certain
explicit replicas) be included in a distribute request..

https://cwiki.apache.org/confluence/display/solr/Distributed+Requests



-Hoss
http://www.lucidworks.com/


Re: SOLR Master-Slave Repeater with Load balancer

2013-12-04 Thread kondamudims
Hi Erick,
Thanks a lot for your explanation. We initially considered Solr Cloud but we
have limitation on the number of servers that we can use due to budget
concerns (limit is 2) Solr Cloud requires minimum 3. I have tried out the
solution you suggested and so far its going well and we are not doing self
polling concept.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-Master-Slave-Repeater-with-Load-balancer-tp4103363p4105017.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Inconsistent numFound in SC when querying core directly

2013-12-04 Thread Tim Vaillancourt

Hey all,

Now that I am getting correct results with distrib=false, I've 
identified that 1 of my nodes has just 1/3rd of the total data set and 
totally explains the flapping in results. The fix for this is obvious 
(rebuild replica) but the cause is less obvious.


There is definately more than one issue going on with this SolrCloud 
(but 1 down thanks to Chris' suggestion!), so I'm guessing the fact that 
/clusterstate.json doesn't seem to get updated when nodes are brought 
down/up is the reason why this replica remained in the distributed 
request chain without recovering/re-replicating from leader.


I imagine my Zookeeper ensemble is having some problems unrelated to 
Solr that is the real root cause.


Thanks!

Tim

On 04/12/13 03:00 PM, Tim Vaillancourt wrote:
Chris, this is extremely helpful and it's silly I didn't think of this 
sooner! Thanks a lot, this makes the situation make much more sense.


I will gather some proper data with your suggestion and get back to 
the thread shortly.


Thanks!!

Tim

On 04/12/13 02:57 PM, Chris Hostetter wrote:

:
: I may be incorrect here, but I assumed when querying a single core 
of a
: SolrCloud collection, the SolrCloud routing is bypassed and I am 
talking

: directly to a plain/non-SolrCloud core.

No ... every query received from a client by solr is handled by a single
core -- if that core knows it's part of a SolrCloud collection then it
will do a distributed search across a random replica from each shard in
that collection.

If you want to bypass the distribute search logic, you have to say so
explicitly...

To ask an arbitrary replica to only search itself add distrib=false to
the request.

Alternatively: you can ask that only certain shard names (or certain
explicit replicas) be included in a distribute request..

https://cwiki.apache.org/confluence/display/solr/Distributed+Requests



-Hoss
http://www.lucidworks.com/


Re: Inconsistent numFound in SC when querying core directly

2013-12-04 Thread Mark Miller
Keep in mind, there have been a *lot* of bug fixes since 4.3.1.

- Mark

On Dec 4, 2013, at 7:07 PM, Tim Vaillancourt t...@elementspace.com wrote:

 Hey all,
 
 Now that I am getting correct results with distrib=false, I've identified 
 that 1 of my nodes has just 1/3rd of the total data set and totally explains 
 the flapping in results. The fix for this is obvious (rebuild replica) but 
 the cause is less obvious.
 
 There is definately more than one issue going on with this SolrCloud (but 1 
 down thanks to Chris' suggestion!), so I'm guessing the fact that 
 /clusterstate.json doesn't seem to get updated when nodes are brought down/up 
 is the reason why this replica remained in the distributed request chain 
 without recovering/re-replicating from leader.
 
 I imagine my Zookeeper ensemble is having some problems unrelated to Solr 
 that is the real root cause.
 
 Thanks!
 
 Tim
 
 On 04/12/13 03:00 PM, Tim Vaillancourt wrote:
 Chris, this is extremely helpful and it's silly I didn't think of this 
 sooner! Thanks a lot, this makes the situation make much more sense.
 
 I will gather some proper data with your suggestion and get back to the 
 thread shortly.
 
 Thanks!!
 
 Tim
 
 On 04/12/13 02:57 PM, Chris Hostetter wrote:
 :
 : I may be incorrect here, but I assumed when querying a single core of a
 : SolrCloud collection, the SolrCloud routing is bypassed and I am talking
 : directly to a plain/non-SolrCloud core.
 
 No ... every query received from a client by solr is handled by a single
 core -- if that core knows it's part of a SolrCloud collection then it
 will do a distributed search across a random replica from each shard in
 that collection.
 
 If you want to bypass the distribute search logic, you have to say so
 explicitly...
 
 To ask an arbitrary replica to only search itself add distrib=false to
 the request.
 
 Alternatively: you can ask that only certain shard names (or certain
 explicit replicas) be included in a distribute request..
 
 https://cwiki.apache.org/confluence/display/solr/Distributed+Requests
 
 
 
 -Hoss
 http://www.lucidworks.com/



Prioritize search returns by URL path?

2013-12-04 Thread Jim Glynn
We have a Telligent based community with Solr as the search engine. We want
to prioritize search returns from within the community by the type of
content: Wiki articles as most relevant, then blog posts, then Verified
answer and Suggested answer forum posts, then remaining forum posts. We have
also implemented a Helpful voting capability and would like to boost items
with more Helpful votes above those within their same category with fewer
votes.

Has anyone out there done something similar, or can someone suggest how to
do this? We're new to search engine tuning, so assume very little knowledge
on our part.

Thanks for your help!
JRG



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Prioritize-search-returns-by-URL-path-tp4105023.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how to increase each index file size

2013-12-04 Thread YouPeng Yang
Hi Erick

   Thanks for your reply.


Regards


2013/12/4 Erick Erickson erickerick...@gmail.com

 Why do you want to do this? Are you seeing performance problems?
 If not, I'd just ignore this problem, premature optimization and all that.

 If you _really_ want to do this, your segments files are closed every
 time you to a commit, opensearcher=true|false doesn't matter.

 BUT, the longer these are the bigger your transaction log will be,
 which may lead to other issues, particularly on restart. See:

 http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

 The key is the section on truncating the tlog.

 And note the sizes of these segments will change as they're
 merged anyway.

 Best,
 Erick


 On Wed, Dec 4, 2013 at 4:42 AM, YouPeng Yang yypvsxf19870...@gmail.com
 wrote:

  Hi
I'm using the SolrCloud integreted with HDFS,I found there are lots of
  small size files.
So,I'd like to increase  the index  file size  while doing DIH
  full-import. Any suggestion to achieve this goal.
 
 
  Regards.
 



Re: a core for every user, lots of users... are there issues

2013-12-04 Thread Erick Erickson
I don't know of anyone who's tried and failed to combine transient cores
and SolrCloud. I also don't know of anyone who's tried and succeeded.

I'm saying that the transient core stuff has been thoroughly tested in
non-cloud mode. And people have been working with it for a couple of
releases now. I know of no a-priori reason it wouldn't work in SolrCloud.
But I haven't personally done it, nor do I know of anyone who has. It might
just work, but the proof is in the pudding.

I've heard some scuttlebutt that the combination of SolrCloud and transient
cores is being, or will be soon, investigated. As in testing and writing
test cases. Being a pessimist by nature on these things, I suspect (but
don't know) that something will come up.

For instance, SolrCloud tries to keep track of all the states of all the
nodes. I _think_ (but don't know for sure) that this is just keeping
contact with the JVM, not particular cores. But what if there's something I
don't know about that pings the individual cores? That would keep them
constantly loading/unloading, which might crop up in unexpected ways. I've
got to emphasize that this is an unknown (at least to me), but an example
of something that could crop up. I'm sure there are other possibilities.

Or distributed updates. For that, every core on every node for a shard in
collectionX must process the update. So for updates, each and every core in
each and every shard might have to be loaded for the update to succeed if
the core is transient. Does this happen fast enough in all cases so a
timeout doesn't cause the update to fail? Or the node to be marked as down?
What about combining that with a heavy query load? I just don't know.

It's uncharted territory is all. I'd love it for you to volunteer to be the
first :). There's certainly committer interest in making this case work so
you wouldn't be left hanging all alone. If I were planning a product
though, I'd either treat the combination of transient cores and SolrCloud
as a RD project or go with non-cloud mode until I had some reassurance
that transient cores and SolrCloud played nicely together.

All that said, I don't want to paint too bleak a picture. All the transient
core stuff is local to a particular node. SolrCloud and ZooKeeper shouldn't
be interested in the details. It _should_ just work. It's just that I
can't point to any examples where that's been tried

Best,
Erick


On Wed, Dec 4, 2013 at 5:08 PM, hank williams hank...@gmail.com wrote:

 Oh my... when you say I don't know anyone who's combined the two. do you
 mean that those that have tried have failed or that no one has gotten
 around to trying? It sounds like you are saying you have some specific
 knowledge that right now these wont work, otherwise you wouldnt say
 committers
 will be addressing this sometime soon, right?

 I'm worried as we need to make a practical decision here and it sounds like
 maybe we should stick with solr for now... is that what you are saying?


 On Wed, Dec 4, 2013 at 5:01 PM, Erick Erickson erickerick...@gmail.com
 wrote:

  Hank:
 
  I should add that lots of cores and SolrCloud aren't guaranteed to play
  nice together. I think some of the committers will be addressing this
  sometime soon.
 
  I'm not saying that this will certainly fail, OTOH I don't know anyone
  who's combined the two.
 
  Erick
 
 
  On Wed, Dec 4, 2013 at 3:18 PM, hank williams hank...@gmail.com wrote:
 
   Super helpful. Thanks.
  
  
   On Wed, Dec 4, 2013 at 2:53 PM, Shawn Heisey s...@elyograg.org
 wrote:
  
On 12/4/2013 12:34 PM, hank williams wrote:
   
Ok one more simple question. We just upgraded to 4.6 from 4.2. In
 4.2
  we
were *trying* to use the rest API function create to create cores
without
having to manually mess with files on the server. Is this what
  create
was
supposed to do? If so it was borken or we werent using it right. In
  any
case in 4.6 is that the right way to programmatically add cores in
discovery mode?
   
   
If you are NOT in SolrCloud mode, in order to create new cores, the
   config
files need to already exist on the disk.  This is the case with all
versions of Solr.
   
If you're running in SolrCloud mode, the core is associated with a
collection.  Collections have a link to aconfig in zookeeper.  The
  config
is not stored with the core on the disk.
   
Thanks,
Shawn
   
   
  
  
   --
   blog: whydoeseverythingsuck.com
  
 



 --
 blog: whydoeseverythingsuck.com



Re: SOLR Master-Slave Repeater with Load balancer

2013-12-04 Thread Erick Erickson
bq:  but we have limitation on the number of servers that we can use due to
budget
concerns (limit is 2)

really, really, really push back to your project managers on this. So what
you need 3 machines for a ZooKeeper quorum? The needs of ZK are quite
light, they don't need a powerful machine. Your managers are saying for
want of spending $1,000 on a machine, which we will waste 10 times that
paying engineers to set up an old-style system, we can't go with
SolrCloud. You can run the ZooKeeper instances in a separate JVM on your
two servers and have a cheap machine running ZK for the third instance if
necessary.

Another rant finished.

Erick


On Wed, Dec 4, 2013 at 6:07 PM, kondamudims kondamud...@gmail.com wrote:

 Hi Erick,
 Thanks a lot for your explanation. We initially considered Solr Cloud but
 we
 have limitation on the number of servers that we can use due to budget
 concerns (limit is 2) Solr Cloud requires minimum 3. I have tried out the
 solution you suggested and so far its going well and we are not doing self
 polling concept.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/SOLR-Master-Slave-Repeater-with-Load-balancer-tp4103363p4105017.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: SOLR Master-Slave Repeater with Load balancer

2013-12-04 Thread Walter Underwood
Erick is right, you have been put in a terrible position.

You need to get agreement, in writing, that it is OK for search to go down when 
one server is out of service. This might be for scheduled maintenance or even a 
config update. When one server is down, search is down, period.

This requirement is like choosing a truck, but insisting that there is only 
budget for three tires.

You must, must, must communicate the risks associated with a two-server 
SolrCloud cluster.

wunder

On Dec 4, 2013, at 7:10 PM, Erick Erickson erickerick...@gmail.com wrote:

 bq:  but we have limitation on the number of servers that we can use due to
 budget
 concerns (limit is 2)
 
 really, really, really push back to your project managers on this. So what
 you need 3 machines for a ZooKeeper quorum? The needs of ZK are quite
 light, they don't need a powerful machine. Your managers are saying for
 want of spending $1,000 on a machine, which we will waste 10 times that
 paying engineers to set up an old-style system, we can't go with
 SolrCloud. You can run the ZooKeeper instances in a separate JVM on your
 two servers and have a cheap machine running ZK for the third instance if
 necessary.
 
 Another rant finished.
 
 Erick
 
 
 On Wed, Dec 4, 2013 at 6:07 PM, kondamudims kondamud...@gmail.com wrote:
 
 Hi Erick,
 Thanks a lot for your explanation. We initially considered Solr Cloud but
 we
 have limitation on the number of servers that we can use due to budget
 concerns (limit is 2) Solr Cloud requires minimum 3. I have tried out the
 solution you suggested and so far its going well and we are not doing self
 polling concept.
 
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/SOLR-Master-Slave-Repeater-with-Load-balancer-tp4103363p4105017.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 

--
Walter Underwood
wun...@wunderwood.org





SOLR 4 not utilizing multi CPU cores

2013-12-04 Thread Salman Akram
Hi,

We recently upgraded to SOLR 4.6 from SOLR 1.4.1. Overall the performance
went down for large phrase queries. On some analysis we have seen that
1.4.1 utilized multiple cpu cores for such queries but SOLR 4.6 is only
utilizing single cpu core. Any idea on what could be the reason?

Note: We are not using SOLR Sharding.

-- 
Regards,

Salman Akram


Re: SOLR 4 not utilizing multi CPU cores

2013-12-04 Thread Andrea Gazzarini
Hi, I did moreless the same but didn't get that behaviour...could you give
us more details

Best,
Gazza
On 5 Dec 2013 06:54, Salman Akram salman.ak...@northbaysolutions.net
wrote:

 Hi,

 We recently upgraded to SOLR 4.6 from SOLR 1.4.1. Overall the performance
 went down for large phrase queries. On some analysis we have seen that
 1.4.1 utilized multiple cpu cores for such queries but SOLR 4.6 is only
 utilizing single cpu core. Any idea on what could be the reason?

 Note: We are not using SOLR Sharding.

 --
 Regards,

 Salman Akram



Re: facet.method=fcs vs facet.method=fc on solr slaves

2013-12-04 Thread Mikhail Khludnev
Hello Patrick,

Replication flushes UnInvertedField cache that impacts fc, but doesn't harm
Lucene's FieldCache which is for fcs. You can check how much time in millis
is spend on UnInvertedField cache regeneration in INFO logs like
UnInverted multi-valued field ,time=### ...


On Thu, Dec 5, 2013 at 12:15 AM, Patrick O'Lone pol...@townnews.com wrote:

 Is there any advantage on a Solr slave to receive queries using
 facet.method=fcs instead of the default of facet.method=fc? Most of the
 segment files are unchanged between replication events - but I wasn't
 sure if replication would cause the unchanged segment field caches to be
 lost anyway.
 --
 Patrick O'Lone
 Director of Software Development
 TownNews.com

 E-mail ... pol...@townnews.com
 Phone  309-743-0809
 Fax .. 309-743-0830




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Re: Questions about commits and OOE

2013-12-04 Thread Mikhail Khludnev
On Wed, Dec 4, 2013 at 6:36 PM, OSMAN Metin metin.os...@canal-plus.comwrote:

 During this massive update, we have sometimes a peak of active threads
 exceeding the limit of 8192 process authorized for the user running the
 tomcat and zookeeper process.
 When this happens, every hardCommit is failing with an OutOfMemory :
 unable to create native thread message.


Hello,

Can you check by jstack what are these threads? If they are web container
threads you need to limit thread pool, if these are background merge
threads you might need to configure merge policy, etc.


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Sorting on solr results

2013-12-04 Thread anuragwalia
HI All,

Please provide me your idea for below problem.

I required to sort product on webshop price with position.

e.g. If we have three product (A, B ,C) needs to sort Price asc and position
asc.

ID  Price   Position
A   10  3
B   10  2
C   20  5

Result should be sorted forst by price than by position.

Required Order of result :
B
A
C
While A,B products having same price but position of B is higher then A.
My result set query as of now
:@QueryTerm=*OnlineFlag=1@Sort.Price=0,position=0

Please suggest your views for the same.

 





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sorting-on-solr-results-tp4105060.html
Sent from the Solr - User mailing list archive at Nabble.com.