Solr Indexing MAX FILE LIMIT

2012-11-13 Thread mitra
 Hello Guys

Im using Apache solr 3.6.1 on tomcat 7 for indexing csv files using curl on
windows machine

** My question is that what would be the max csv file size limit when doing
a HTTP POST or while using the following curl command
curl http://localhost:8080/solr/update/csv -F stream.file=D:\eighth.csv -F
commit=true -F optimize=true -F encapsulate= -F keepEmpty=true

** My requirement is quite large because we have to index CSV files ranging
between 8 to 10 GB

** What would be the optimum settings for index parameters like commit for
better perfomance on a machine with 8gb RAM

Please guide me on it

Thanks in Advance



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Indexing-MAX-FILE-LIMIT-tp4019952.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Solr Indexing MAX FILE LIMIT

2012-11-13 Thread Markus Jelsma
Hi - instead of trying to make the system ingest such large files perhaps you 
can split the files in many small pieces. 
 
-Original message-
 From:mitra mitra.re...@ornext.com
 Sent: Tue 13-Nov-2012 09:05
 To: solr-user@lucene.apache.org
 Subject: Solr Indexing MAX FILE LIMIT
 
  Hello Guys
 
 Im using Apache solr 3.6.1 on tomcat 7 for indexing csv files using curl on
 windows machine
 
 ** My question is that what would be the max csv file size limit when doing
 a HTTP POST or while using the following curl command
 curl http://localhost:8080/solr/update/csv -F stream.file=D:\eighth.csv -F
 commit=true -F optimize=true -F encapsulate= -F keepEmpty=true
 
 ** My requirement is quite large because we have to index CSV files ranging
 between 8 to 10 GB
 
 ** What would be the optimum settings for index parameters like commit for
 better perfomance on a machine with 8gb RAM
 
 Please guide me on it
 
 Thanks in Advance
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-Indexing-MAX-FILE-LIMIT-tp4019952.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 


RE: Solr Indexing MAX FILE LIMIT

2012-11-13 Thread mitra
Thankyou


*** I understand that the default size for HTTP POST in tomcat is 2mb can we
change that somehow
   so that i dont need to split the 10gb csv into 2mb chunks

curl http://localhost:8080/solr/update/csv -F stream.file=D:\eighth.csv -F
commit=true -F optimize=true -F encapsulate= -F keepEmpty=true 

*** As I mentioned im using the above command to post rather than using this
below format

curl http://localhost:8080/solr/update/csv --data-binary @eighth.csv -H
'Content-type:text/plain; charset=utf-8'

***My question Is the Limit still applicable even when not using the above
data binary format also




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Indexing-MAX-FILE-LIMIT-tp4019952p4019965.html
Sent from the Solr - User mailing list archive at Nabble.com.


Removing Shards from Zookeeper - no servers hosting shard

2012-11-13 Thread Gilles Comeau
Hi all,

We've just updated to SOLR 4.0 production and Zookeeper 3.3.6 from SOLR 4.0 
development version circa November 2011.  We keep 6 months of data online in 
our primary cluster, and archive off old stuff to a slower disk archive 
cluster.   We used to remove SOLR cores with the following code, but everything 
has changed in Zookeeper now.

Old code to remove cores from Zookeeper:


curl 
http://127.0.0.1:8080/solr/admin/cores?action=UNLOADcore=${SHARD}http://127.0.0.1:8080/solr/admin/cores?action=UNLOADcore=$%7bSHARD%7d

echo Removing indexes from all Zookeeper hosts
for (( i=0; i${#ZK_HOSTS[*]}; i++ ))
do
$JAVA -cp 
.:/apps/zookeeper-3.3.5/zookeeper-3.3.5.jar:/apps/zookeeper-3.3.5/lib/jline-0.9.94.jar:/apps/zookeeper-3.3.5/lib/log4j-1.2.15.jar
 org.apache.zookeeper.ZooKeeperMain -server ${ZK_HOSTS[$i]} delete 
/collections/polecat/shards/solrenglish:8080_solr_$SHARD/$HOSTNAME:8080_solr_$SHARD
$JAVA -cp 
.:/apps/zookeeper-3.3.5/zookeeper-3.3.5.jar:/apps/zookeeper-3.3.5/lib/jline-0.9.94.jar:/apps/zookeeper-3.3.5/lib/log4j-1.2.15.jar
 org.apache.zookeeper.ZooKeeperMain -server ${ZK_HOSTS[$i]} delete 
/collections/polecat/shards/solrenglish:8080_solr_$SHARD
Done

curl http://solrmaster01:8080/solr/admin/cores?action=RELOADcore=master

Now that we have migrated, I have tried removing cores from Zookeeper by 
removing the stuff for the unloaded core in leaders and leader_elect, but 
for some reason SOLR keeps sending the requests to the shard, and I end up with 
the no servers hosting shard error.

Does anyone know how to remove a SOLR core from a SOLR server and have 
Zookeeper updated, and have distributed queries still work?   The only thing I 
know how to do now is stop tomcat, stop zookeeper, clear out the data directory 
and then restart both.   This isn't really ideal for a process I'd like to have 
running each night, and surely it is something others have it.  I've tried 
google searching, and what I find is references to the bug where solr notifies 
zookeeper on core unloads which is marked as fixed, and people talking about 
how it doesn't work but if your run reloads on each core, it will work.  (also 
doesn't work when I do it)

Regards,

Gilles Comeau


Re: Nested Join Queries

2012-11-13 Thread Mikhail Khludnev
Gerald,

I wonder if you tried to approach BlockJoin for your problem? Can you
afford less frequent updates?


On Wed, Nov 7, 2012 at 5:40 PM, Gerald Blanck gerald.bla...@barometerit.com
 wrote:

 Thank you Erick for your reply.  I understand that search is not an RDBMS.
  Yes, we do have a huge combinatorial explosion if we de-normalize and
 duplicate data.  In fact, I believe our use case is exactly what the Solr
 developers were trying to solve with the addition of the Join query.  And
 while the example I gave illustrates the problem we are solving with the
 Join functionality, it is simplistic in nature compared to what we have in
 actuality.

 Am still looking for an answer here if someone can shed some light.
  Thanks.


 On Sat, Nov 3, 2012 at 9:38 PM, Erick Erickson erickerick...@gmail.com
 wrote:

  I'm going to go a bit sideways on you, partly because I can't answer the
  question G...
 
  But, every time I see someone doing what looks like substituting core
 for
  table and
  then trying to use Solr like a DB, I get on my soap-box and preach..
 
  In this case, consider de-normalizing your DB so you can ask the query in
  terms
  of search rather than joins. e.g.
 
  Make each document a combination of the author and the book, with an
  additional
  field author_has_written_a_bestseller. Now your query becomes a really
  simple
  search, author:name AND author_has_written_a_bestseller:true. True,
 this
  kind
  of approach isn't as flexible as an RDBMS, but it's a _search_ rather
 than
  a query.
  Yes, it replicates data, but unless you have a huge combinatorial
  explosion, that's
  not a problem.
 
  And the join functionality isn't called pseudo for nothing. It was
  written for a specific
  use-case. It is often expensive, especially when the field being joined
 has
  many unique
  values.
 
  FWIW,
  Erick
 
 
  On Fri, Nov 2, 2012 at 11:32 AM, Gerald Blanck 
  gerald.bla...@barometerit.com wrote:
 
   At a high level, I have a need to be able to execute a query that joins
   across cores, and that query during its joining may join back to the
   originating core.
  
   Example:
   Find all Books written by an Author who has written a best selling
 Book.
  
   In Solr query syntax
   A) against the book core - bestseller:true
   B) against the author core - {!join fromIndex=book from=id
   to=bookid}bestseller:true
   C) against the book core - {!join fromIndex=author from=id
   to=authorid}{!join fromIndex=book from=id to=bookid}bestseller:true
  
   A - returns results
   B - returns results
   C - does not return results
  
   Given that A and C use the same core, I started looking for join code
  that
   compares the originating core to the fromIndex and found this
   in JoinQParserPlugin (line #159).
  
   if (info.getReq().getCore() == fromCore) {
  
 // if this is the same core, use the searcher passed in...
   otherwise we could be warming and
  
 // get an older searcher from the core.
  
 fromSearcher = searcher;
  
   } else {
  
 // This could block if there is a static warming query with a
   join in it, and if useColdSearcher is true.
  
 // Deadlock could result if two cores both had
 useColdSearcher
   and had joins that used eachother.
  
 // This would be very predictable though (should happen every
   time if misconfigured)
  
 fromRef = fromCore.getSearcher(false, true, null);
  
  
 // be careful not to do anything with this searcher that
  requires
   the thread local
  
 // SolrRequestInfo in a manner that requires the core in the
   request to match
  
 fromSearcher = fromRef.get();
  
   }
  
   I found that if I were to modify the above code so that it always
 follows
   the logic in the else block, I get the results I expect.
  
   Can someone explain to me why the code is written as it is?  And if we
  were
   to run with only the else block being executed, what type of adverse
   impacts we might have?
  
   Does anyone have other ideas on how to solve this issue?
  
   Thanks in advance.
   -Gerald
  
 



 --

 *Gerald Blanck*

 baro*m*eter*IT*

 1331 Tyler Street NE, Suite 100
 Minneapolis, MN 55413


 612.208.2802

 gerald.bla...@barometerit.com




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Re: The question about ConcurrentUpdateSolrServer

2012-11-13 Thread Mikhail Khludnev
L'ubov',

Yes it does. There were only two long request with huge bodies contains
roughly about 125K docs. You can also check Solr side LogUpdateProcessor
log messages regarding number of requests and docs passed in each.


On Wed, Nov 7, 2012 at 5:26 PM, Lyuba Romanchuk
lyuba.romanc...@gmail.comwrote:

 Hi,
 If I run my application that uses solrj API (ConcurrentUpdateSolrServer
 with buffer 10 and thread count 2) I get the logs (see below) with only two
 rows like *Status for: uid is 200. *
 Does it mean that only two http requests were send?
 The application indexes 2,500,000 different documents, and this is the
 number of docs that I get in web ui. But I thought that I should see a lot
 of rows like this not only 2, something like ~250,000.

 *17:47:42,842  INFO org.apache.solr.client.solrj.impl.HttpClientUtil:102 -
 Creating new http client,
 config:maxConnections=128maxConnectionsPerHost=32followRedirects=false*
 *17:47:43,122  INFO org.apache.solr.client.solrj.impl.HttpClientUtil:102 -
 Creating new http client,
 config:maxConnections=128maxConnectionsPerHost=32followRedirects=false*
 *17:47:43,128  INFO org.apache.solr.client.solrj.impl.HttpClientUtil:102 -
 Creating new http client,
 config:maxConnections=128maxConnectionsPerHost=32followRedirects=false*
 *17:47:43,539  INFO
 org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer:121 - starting
 runner:

 org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner@3bc8c52e
 *
 *17:47:43,539  INFO
 org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer:121 - starting
 runner:

 org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner@7a096dab
 *
 *17:50:46,257  INFO
 org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer:200 - Status
 for: 5e41920f-b49b-4062-8f01-06e3d36926c9 is 200*
 *17:50:46,257  INFO
 org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer:200 - Status
 for: 185b1dfd-d0b7-4f75-bfc5-1e38e89a05f2 is 200*
 *17:50:46,258  INFO
 org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer:240 -
 finished:

 org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner@3bc8c52e
 *
 *17:50:46,258  INFO
 org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer:240 -
 finished:

 org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner@7a096dab
 *



 Best regards,
 Lyuba




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Re: How to speed up Facet count (Big index) ??!!!!

2012-11-13 Thread Aeroox Aeroox
Thanks  Yonik.

Should I consider sharding in this case ( actually I have one big index
with replication) ? Or create 2 index (one for search and other for facet
on a different machine) ?

Thanks folks

With love from Paris (it's raining today :(

Le mardi 13 novembre 2012, Yonik Seeley a écrit :

 On Mon, Nov 12, 2012 at 8:39 PM, Aeroox Aeroox 
 aero...@gmail.comjavascript:;
 wrote:
  Hi folks,
 
  I have a solr index with up to 50M documents. A document contain 62
 fields
  (docid, name, location).
 
  The facet count took 1 to 2 minutes with this params :
 
  http://.../select/?q=solr;
 
 version=2.2start=0rows=0facet=truefacet.limit=6facet.mincount=1mm=3-1facet.field=schoolname_hlfacet.method=fc

 It should hopefully just take that long the first time?  How much time
 does it take to facet on the same field subsequent times?

  And my cache policy :
 
  filterCache class=solr.FastLRUCache
   size=4096
   initialSize=4096
   autowarmCount=4096/
 
  queryResultCache class=solr.LRUCache
   size=5000
   initialSize=5000
   autowarmCount=5000/

 These are relatively big caches - consider reducing them if you can.
 Especially the filter cache, depending on what percent of the entries
 are bitsets.
 Worst case would be 50M / 8 * 4096 = 25GB of bitsets.

  * i'm using solr 1.4 (LUCENE_36)
  * 64GB Ram (with 60GB allocated to java/tomcat6)

 Reduce this if you can - it doesn't leave enough memory for the OS to
 cache the index files and can contribute to slowness (more disk IO).

 -Yonik
 http://lucidworks.com



Re: how to sort the solr suggester's result

2012-11-13 Thread Erick Erickson
Could you just sort the suggestions at the app level? That is, read them
all into a list and sort before presenting them to the user?

Best
Erick


On Sun, Nov 11, 2012 at 10:52 PM, 徐郑 eyun...@gmail.com wrote:

 following is my config , it suggests words well .
 i want to get a sorted result when it suggest, so i added a transformer ,
 it will add a tab(\t) separated float weight string
 to the end of the Suggestion field , but the suggestion result still does't
 sorted correctly.

 my suggest result( note the float number at the end is the weight)

 lst name=spellcheck
 lst name=suggestions
 lst name=我
 int name=numFound10/int
 int name=startOffset1/int
 int name=endOffset2/int
 arr name=suggestion
 str我脑中的橡皮擦 2.12/str
 str我老婆是大佬3 2.07/str
 str我老婆是大佬2 2.12/str




 schema.xml

 field name=Suggestion type=string indexed=true stored=true/



 solrconfig.xml

   searchComponent class=solr.SpellCheckComponent name=suggest
 lst name=spellchecker
 str name=namesuggest/str
 str name=fieldSuggestion/str
 str
 name=classnameorg.apache.solr.spelling.suggest.Suggester/str
 str
 name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str
 !-- float name=threshold0.0001/float --
 str name=spellcheckIndexDirspellchecker/str
 str name=comparatorClassfreq/str
 str name=buildOnCommittrue/str

 /lst
 /searchComponent
 requestHandler class=org.apache.solr.handler.component.SearchHandler
 name=/suggest
 lst name=defaults
 str name=spellchecktrue/str
 str name=spellcheck.dictionarysuggest/str
 str name=spellcheck.count10/str
 str name=spellcheck.onlyMorePopulartrue/str
 str name=spellcheck.collatetrue/str
 /lst
 arr name=components
 strsuggest/str
 /arr
 /requestHandler

 --

 eyun

 The truth, whether or not

 Q:276770341   G+:eyun...@gmail.com



Re: How to speed up Facet count (Big index) ??!!!!

2012-11-13 Thread Upayavira
I'd say you are at a point where sharding may well help. But, as others
have suggested, you have other issues to consider first - less memory
for Solr and upgrade to a more modern Solr. 

Also, if as Yonik asks only the first query is slow, you can set up a
NewSearcher query in your solrconfig.xml to run this first query on
every commit, meaning your users will always get faster queries.

Upayavira

On Tue, Nov 13, 2012, at 11:16 AM, Aeroox Aeroox wrote:
 Thanks  Yonik.
 
 Should I consider sharding in this case ( actually I have one big index
 with replication) ? Or create 2 index (one for search and other for facet
 on a different machine) ?
 
 Thanks folks
 
 With love from Paris (it's raining today :(
 
 Le mardi 13 novembre 2012, Yonik Seeley a écrit :
 
  On Mon, Nov 12, 2012 at 8:39 PM, Aeroox Aeroox 
  aero...@gmail.comjavascript:;
  wrote:
   Hi folks,
  
   I have a solr index with up to 50M documents. A document contain 62
  fields
   (docid, name, location).
  
   The facet count took 1 to 2 minutes with this params :
  
   http://.../select/?q=solr;
  
  version=2.2start=0rows=0facet=truefacet.limit=6facet.mincount=1mm=3-1facet.field=schoolname_hlfacet.method=fc
 
  It should hopefully just take that long the first time?  How much time
  does it take to facet on the same field subsequent times?
 
   And my cache policy :
  
   filterCache class=solr.FastLRUCache
size=4096
initialSize=4096
autowarmCount=4096/
  
   queryResultCache class=solr.LRUCache
size=5000
initialSize=5000
autowarmCount=5000/
 
  These are relatively big caches - consider reducing them if you can.
  Especially the filter cache, depending on what percent of the entries
  are bitsets.
  Worst case would be 50M / 8 * 4096 = 25GB of bitsets.
 
   * i'm using solr 1.4 (LUCENE_36)
   * 64GB Ram (with 60GB allocated to java/tomcat6)
 
  Reduce this if you can - it doesn't leave enough memory for the OS to
  cache the index files and can contribute to slowness (more disk IO).
 
  -Yonik
  http://lucidworks.com
 


Re: Unable to run two multicore Solr instances under Tomcat

2012-11-13 Thread Erick Erickson
At a guess you have leftover jars from your earlier installation in your
classpath that are being picked up. I've always found that figuring out how
_that_ happened is...er... interesting...

Best
Erick


On Mon, Nov 12, 2012 at 7:44 AM, Adam Neal an...@mass.co.uk wrote:

 Hi,

 I have been running two multicore Solr instances under Tomcat using a
 nightly build of 4.0 from September 2011. This has been running fine but
 when I try to update these instances to the release version of 4.0 I'm
 hitting problems when the second instance starts up. If I have one instance
 on the release version and one on the nightly build it also works fine.

 It's running on a Solaris 10 box using Tomcat 6.0.26 and Java 1.6.0_20

 I can run up either instance on it's own and it works fine, it's just when
 starting both together so I'm pretty sure my configs aren't the issue.

 Snippet from the log is below, please note that I have had to type this
 out so there may be some typos, hopefully not!

 Any ideas?

 Adam


 12-Nov-2012 09:58:50 org.apache.solr.core.SolrResourceLoader locateSolrHome
 INFO: Using JNDI solr.home: /conf_solr/instance2
 12-Nov-2012 09:58:50 org.apache.solr.core.SolrResourceLoader init
 INFO: new SolrResourceLoader for deduced Solr Home: '/conf_solr/instance2/'
 12-Nov-2012 09:58:52 org.apache.solr.servlet.SolrDispatchFilter init
 INFO: SolrDispatchFilter.init()
 12-Nov-2012 09:58:52 org.apache.solr.core.SolrResourceLoader locateSolrHome
 INFO: Using JNDI solr.home /conf_solr/instance2
 12-Nov-2012 09:58:52 org.apache.solr.core.CoreContainer$Initializer
 initialize
 INFO: looking for solr.xml: /conf_solr/instance2/solr.xml
 12-Nov-2012 09:58:52 org.apache.solr.core.CoreContainer init
 INFO: New CoreContainer 15471347
 12-Nov-2012 09:58:52 org.apache.solr.core.CoreContainer load
 INFO: Loading CoreContainer using Solr Home: '/conf_solr/instance2/'
 12-Nov-2012 09:58:52 org.apache.solr.core.SolrResourceLoader init
 INFO: new SOlrResourceLoader for directory: '/conf_solr/instance2/'
 12-Nov-2012 09:58:52 org.apache.solr.servlet.SolrDispatchFilter init
 SEVERE: Could not start Solr. Check solr/home property and the logs
 12-Nov-2012 09:58:52 org.apache.solr.common.SolrException log
 SEVERE: null:java.lang.ClassCastException:
 org.apache.xerces.parsers.XIncludeAwareParserConfiguration cannot be cast
 to org.apache.xerces.xni.parser.XMLParserConfiguration
 at org.apache.xerces.parsers.DOMParser.init(Unknown Source)
 at org.apache.xerces.parsers.DOMParser.init(Unknown Source)
 at org.apache.xerces.jaxp.DocumentBuilderImpl.init(Unknown
 Source)
 at
 org.apache.xerces.jaxp.DocumentBuilderFactoryImpl.newDocumentBuilder(Unknown
 Source)
 at
 com.sun.org.apache.xalan.internal.xsltc.trax.SAX2DOM.createDocument(SAX2DOM.java:324)
 at
 com.sun.org.apache.xalan.internal.xsltc.trax.SAX2DOM.init(SAX2DOM.java:84)
 at
 com.sun.org.apache.xalan.internal.xsltc.runtime.output.TranslateOutputHandlerFactory.getSerializationHanlder(TransletOutputHandlerFactory.java:187)
 at
 com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.getOutputHandler(TransformerImpl.java:392)
 at
 com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:298)
 at
 org.apache.solr.core.CoreContainer.copyDoc(CoreContainer.java:551)
 at org.apache.solr.core.CoreContainer.load(CoreContainer.java:381)
 at org.apache.solr.core.CoreContainer.load(CoreContainer.java:356)
 at
 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:308)
 at
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:107)
 at
 org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295)
 at
 org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422)
 at
 org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:115)
 at
 org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3838)
 at
 org.apache.catalina.core.StandardContext.start(StandardContext.java:4488)
 at
 org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791)
 at
 org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771)
 at
 org.apache.catalina.core.StandardHost.addChild(StandardHost.java:546)
 at
 org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:637)
 at
 org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:563)
 at
 org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:498)
 at
 org.apache.catalina.startup.HostConfig.start(HostConfig.java:1277)
 at
 org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:321)
 at
 org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
 

Re: Admin Permissions

2012-11-13 Thread Erick Erickson
Slap them firmly on the wrist if they do?

The Solr admin is really designed with trusted users in mind. There are no
provisions that I know of for securing some of the functions.

Your developers have access to the Solr server through the browser, right?
They can do all of that via URL, see: http://wiki.apache.org/solr/CoreAdmin,
they don't need to use the admin server at all.

So unless you're willing to put a lot of effort into it, I don't think you
really can lock it down. If you really don't trust them to not do bad
things, set up a dev environment and lock them out of your production
servers totally?

Best
Erick


On Mon, Nov 12, 2012 at 12:41 PM, Michael Long ml...@bizjournals.comwrote:

 I really like the new admin in solr 4.0, but specifically I don't want
 developers to be able to unload, rename, swap, reload, optimize, or add
 core.

 Any ideas on how I could still give access to the rest of the admin
 without giving access to these? It is very helpful for them to have access
 to the Query, Analysis, etc.



RE: Unable to run two multicore Solr instances under Tomcat

2012-11-13 Thread Adam Neal
Hi Erick,

Thanks for the info, I figured out that it was a jar problem earlier today but 
I don't think it is an old jar. Both of the instances I ran included the 
extraction libraries and it appears that the problem is due to the 
xercesImpl-2.9.1.jar. If I remove the extraction tool jars from one of the 
instances, or even just the specific jar, then everything works as normal. 
Fortunately I only need the extraction tools in one of my instances so this 
work around is good for now.

I can't see any old jars that would interfere, I will try and test this at some 
point on a clean install of 4.0 and see if the same problem occurs.


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Tue 13/11/2012 12:05
To: solr-user@lucene.apache.org
Subject: Re: Unable to run two multicore Solr instances under Tomcat
 
At a guess you have leftover jars from your earlier installation in your
classpath that are being picked up. I've always found that figuring out how
_that_ happened is...er... interesting...

Best
Erick


On Mon, Nov 12, 2012 at 7:44 AM, Adam Neal an...@mass.co.uk wrote:

 Hi,

 I have been running two multicore Solr instances under Tomcat using a
 nightly build of 4.0 from September 2011. This has been running fine but
 when I try to update these instances to the release version of 4.0 I'm
 hitting problems when the second instance starts up. If I have one instance
 on the release version and one on the nightly build it also works fine.

 It's running on a Solaris 10 box using Tomcat 6.0.26 and Java 1.6.0_20

 I can run up either instance on it's own and it works fine, it's just when
 starting both together so I'm pretty sure my configs aren't the issue.

 Snippet from the log is below, please note that I have had to type this
 out so there may be some typos, hopefully not!

 Any ideas?

 Adam


 12-Nov-2012 09:58:50 org.apache.solr.core.SolrResourceLoader locateSolrHome
 INFO: Using JNDI solr.home: /conf_solr/instance2
 12-Nov-2012 09:58:50 org.apache.solr.core.SolrResourceLoader init
 INFO: new SolrResourceLoader for deduced Solr Home: '/conf_solr/instance2/'
 12-Nov-2012 09:58:52 org.apache.solr.servlet.SolrDispatchFilter init
 INFO: SolrDispatchFilter.init()
 12-Nov-2012 09:58:52 org.apache.solr.core.SolrResourceLoader locateSolrHome
 INFO: Using JNDI solr.home /conf_solr/instance2
 12-Nov-2012 09:58:52 org.apache.solr.core.CoreContainer$Initializer
 initialize
 INFO: looking for solr.xml: /conf_solr/instance2/solr.xml
 12-Nov-2012 09:58:52 org.apache.solr.core.CoreContainer init
 INFO: New CoreContainer 15471347
 12-Nov-2012 09:58:52 org.apache.solr.core.CoreContainer load
 INFO: Loading CoreContainer using Solr Home: '/conf_solr/instance2/'
 12-Nov-2012 09:58:52 org.apache.solr.core.SolrResourceLoader init
 INFO: new SOlrResourceLoader for directory: '/conf_solr/instance2/'
 12-Nov-2012 09:58:52 org.apache.solr.servlet.SolrDispatchFilter init
 SEVERE: Could not start Solr. Check solr/home property and the logs
 12-Nov-2012 09:58:52 org.apache.solr.common.SolrException log
 SEVERE: null:java.lang.ClassCastException:
 org.apache.xerces.parsers.XIncludeAwareParserConfiguration cannot be cast
 to org.apache.xerces.xni.parser.XMLParserConfiguration
 at org.apache.xerces.parsers.DOMParser.init(Unknown Source)
 at org.apache.xerces.parsers.DOMParser.init(Unknown Source)
 at org.apache.xerces.jaxp.DocumentBuilderImpl.init(Unknown
 Source)
 at
 org.apache.xerces.jaxp.DocumentBuilderFactoryImpl.newDocumentBuilder(Unknown
 Source)
 at
 com.sun.org.apache.xalan.internal.xsltc.trax.SAX2DOM.createDocument(SAX2DOM.java:324)
 at
 com.sun.org.apache.xalan.internal.xsltc.trax.SAX2DOM.init(SAX2DOM.java:84)
 at
 com.sun.org.apache.xalan.internal.xsltc.runtime.output.TranslateOutputHandlerFactory.getSerializationHanlder(TransletOutputHandlerFactory.java:187)
 at
 com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.getOutputHandler(TransformerImpl.java:392)
 at
 com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:298)
 at
 org.apache.solr.core.CoreContainer.copyDoc(CoreContainer.java:551)
 at org.apache.solr.core.CoreContainer.load(CoreContainer.java:381)
 at org.apache.solr.core.CoreContainer.load(CoreContainer.java:356)
 at
 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:308)
 at
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:107)
 at
 org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295)
 at
 org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422)
 at
 org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:115)
 at
 org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3838)
 at
 

Re: java.io.IOException: Map failed :: OutOfMemory

2012-11-13 Thread Erick Erickson
Have you tried the really simple solution of giving your JVM more memory
(-Xmx option)?

Best
Erick


On Tue, Nov 13, 2012 at 2:38 AM, uwe72 uwe.clem...@exxcellent.de wrote:

 Version is 3.6.1 of solr



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/java-io-IOException-Map-failed-OutOfMemory-tp4019802p4019950.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr Indexing MAX FILE LIMIT

2012-11-13 Thread Erick Erickson
Have you considered writing a small SolrJ (or other client) program that
processed the rows in your huge file and sent them to solr in sensible
chunks? That would give you much finer control over how the file was
processed, how many docs were sent to Solr at a time, what to do with
errors. You could even run N simultaneous programs to increase throughput...

FWIW,
Erick


On Tue, Nov 13, 2012 at 3:42 AM, mitra mitra.re...@ornext.com wrote:

 Thankyou


 *** I understand that the default size for HTTP POST in tomcat is 2mb can
 we
 change that somehow
so that i dont need to split the 10gb csv into 2mb chunks

 curl http://localhost:8080/solr/update/csv -F stream.file=D:\eighth.csv
 -F
 commit=true -F optimize=true -F encapsulate= -F keepEmpty=true

 *** As I mentioned im using the above command to post rather than using
 this
 below format

 curl http://localhost:8080/solr/update/csv --data-binary @eighth.csv -H
 'Content-type:text/plain; charset=utf-8'

 ***My question Is the Limit still applicable even when not using the above
 data binary format also




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-Indexing-MAX-FILE-LIMIT-tp4019952p4019965.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: java.io.IOException: Map failed :: OutOfMemory

2012-11-13 Thread uwe72
Thanks Eric. We are using:

export JAVA_OPTS=-XX:MaxPermSize=400m -Xmx2000m -Xms200M
-Dsolr.solr.home=/home/connect/ConnectPORTAL/preview/solr-home

We have arround 5 Millions documents. The index size is arround 50GB.

Before we add a document we delete the same id in the cache, doesn't matter
if the doc exists or not.

We use here the functionality in solrj to delete a list of ids.

Always in this deletion the error occurs.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/java-io-IOException-Map-failed-OutOfMemory-tp4019802p4020027.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is leading wildcard search turned on by default in Solr 3.6.1?

2012-11-13 Thread Dmitry Kan
Just a quick comment from our experience: since we have quite a lot of data
indexed in our Solr, we take some extra measures to ensure, no bogus
wild-card queries are accepted by the system (for instance *, **, *** etc).
And that is done in the QueryParser. Wanted to mention this approach as one
way of handling simple query security checks.

-- Dmitry

On Tue, Nov 13, 2012 at 6:22 AM, Jack Krupansky j...@basetechnology.comwrote:

 Be sure to realize that even with reverse wildcard support, the user can
 add a trailing wildcard as well (double-ended wildcard) and then you are
 back in the same boat.

 The overall idea is that: 1) Hardware is much faster than just 3 or 4
 years ago, and 2) even though document counts are getting much larger, the
 number of unique terms (which is all that matters for wildcard performance)
 does not tend to grow as fast as document count grows. And, some fields
 have a much more limited vocabulary (unique terms), so a leading wildcard
 is not necessarily a big performance hit.

 Technology advances. We should permit our mindsets to advance as well.

 -- Jack Krupansky


 -Original Message- From: François Schiettecatte
 Sent: Monday, November 12, 2012 2:38 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Is leading wildcard search turned on by default in Solr 3.6.1?


 John

 You can still use leading wildcards even if you dont have the
 ReversedWildcardFilterFactory in your analysis but it means you will be
 scanning the entire dictionary when the search is run which can be a
 performance issue. If you do use ReversedWildcardFilterFactory you wont
 have that performance issue but you will increase the overall size of your
 index. Its a tradeoff.

 When I looked into it for a site I built I decided that the tradeoff was
 not worth it (after benchmarking) given how few leading wildcards searches
 it was getting.

 Best regards

 François


 On Nov 12, 2012, at 5:33 PM, johnmu...@aol.com wrote:



 Hi,


 I'm migrating from Solr 1.2 to 3.6.1.  I used the same analyzer as I was,
 and re-indexed my data.  I did not add
 solr.**ReversedWildcardFilterFactory to my index analyzer, but yet
 leading wild cards are working!!  Does this mean it's turned on by default?
  If so, how do I turn it off, and what are the implication of leaving ON?
 Won't my searches be slower and consume more memory?


 Thanks,


 --MJ




-- 
Regards,

Dmitry Kan


Re: Removing Shards from Zookeeper - no servers hosting shard

2012-11-13 Thread Mark Miller
Odd...the unload command should be enough...

On Tue, Nov 13, 2012 at 5:26 AM, Gilles Comeau gilles.com...@polecat.co wrote:
 Hi all,

 We've just updated to SOLR 4.0 production and Zookeeper 3.3.6 from SOLR 4.0 
 development version circa November 2011.  We keep 6 months of data online in 
 our primary cluster, and archive off old stuff to a slower disk archive 
 cluster.   We used to remove SOLR cores with the following code, but 
 everything has changed in Zookeeper now.

 Old code to remove cores from Zookeeper:


 curl 
 http://127.0.0.1:8080/solr/admin/cores?action=UNLOADcore=${SHARD}http://127.0.0.1:8080/solr/admin/cores?action=UNLOADcore=$%7bSHARD%7d

 echo Removing indexes from all Zookeeper hosts
 for (( i=0; i${#ZK_HOSTS[*]}; i++ ))
 do
 $JAVA -cp 
 .:/apps/zookeeper-3.3.5/zookeeper-3.3.5.jar:/apps/zookeeper-3.3.5/lib/jline-0.9.94.jar:/apps/zookeeper-3.3.5/lib/log4j-1.2.15.jar
  org.apache.zookeeper.ZooKeeperMain -server ${ZK_HOSTS[$i]} delete 
 /collections/polecat/shards/solrenglish:8080_solr_$SHARD/$HOSTNAME:8080_solr_$SHARD
 $JAVA -cp 
 .:/apps/zookeeper-3.3.5/zookeeper-3.3.5.jar:/apps/zookeeper-3.3.5/lib/jline-0.9.94.jar:/apps/zookeeper-3.3.5/lib/log4j-1.2.15.jar
  org.apache.zookeeper.ZooKeeperMain -server ${ZK_HOSTS[$i]} delete 
 /collections/polecat/shards/solrenglish:8080_solr_$SHARD
 Done

 curl http://solrmaster01:8080/solr/admin/cores?action=RELOADcore=master

 Now that we have migrated, I have tried removing cores from Zookeeper by 
 removing the stuff for the unloaded core in leaders and leader_elect, but 
 for some reason SOLR keeps sending the requests to the shard, and I end up 
 with the no servers hosting shard error.

 Does anyone know how to remove a SOLR core from a SOLR server and have 
 Zookeeper updated, and have distributed queries still work?   The only thing 
 I know how to do now is stop tomcat, stop zookeeper, clear out the data 
 directory and then restart both.   This isn't really ideal for a process I'd 
 like to have running each night, and surely it is something others have it.  
 I've tried google searching, and what I find is references to the bug where 
 solr notifies zookeeper on core unloads which is marked as fixed, and people 
 talking about how it doesn't work but if your run reloads on each core, it 
 will work.  (also doesn't work when I do it)

 Regards,

 Gilles Comeau



-- 
- Mark


Re: Role/purpose of Overseer?

2012-11-13 Thread Mark Miller
The Overseer isn't mentioned much because it's an implementation
detail that the user doesn't have to really consider.

The Overseer first came about to handle writing the clusterstate.json
file, as a suggestion by Ted Dunning.

Originally, each node would try and update the custerstate.json file
themselves - and use optimistic locking and retries.

We decided that a cleaner method was to have an overseer and let new
nodes register themselves and their latest state as part of a list -
the Overseer then watches this list, and when things change, publishes
a new clusterstate.json - no optimistic locking and retries needed.
All the other nodes watch clusterstate.json and are notified to
re-read it when it changes.

Since, the Overseer has picked up a few other duties when it makes
sense. For example, it handles the shard assignments if a user does
not specify them. It also does the work for the collections api -
eventually this will be beneficial in that it will use a distributed
work queue and be able to resume operations that fail before
completing.

I think over time, there are lots of useful applications for the Overseer.

He is elected in the same manner as a leader for a shard - if the
Overseer goes down, someone simply takes his place.

I don't think the Overseer is going away any time soon.

- Mark

On Mon, Nov 12, 2012 at 9:48 PM, Otis Gospodnetic
otis.gospodne...@gmail.com wrote:
 Hi,

 I was looking at http://wiki.apache.org/solr/SolrCloud and noticed that
 while overseer is mentioned a handful of times, there is nothing there
 that explains what exactly Overseer does.

 This 8-word Javadoc is the best I could find:
 http://search-lucene.com/jd/solr/solr-core/org/apache/solr/cloud/Overseer.html

 The first diagram on http://wiki.apache.org/solr/SolrCloud shows 1
 Overseer.  Does that make it a SPOF?  If not, what happens when it goes
 down?

 Also, is Overseer here to stay?
 The other day, I saw an issue in JIRA questioning its use or something
 along those line.

 Thanks,
 Otis
 --
 Search Analytics - http://sematext.com/search-analytics/index.html
 Performance Monitoring - http://sematext.com/spm/index.html



-- 
- Mark


Re: Nested Join Queries

2012-11-13 Thread Mikhail Khludnev
Please find reference materials

http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html
http://blog.griddynamics.com/2012/08/block-join-query-performs.html



On Tue, Nov 13, 2012 at 3:25 PM, Gerald Blanck 
gerald.bla...@barometerit.com wrote:

 Thank you.  I've not heard of BlockJoin.  I will look into it today.
  Thanks.


 On Tue, Nov 13, 2012 at 5:05 AM, Mikhail Khludnev 
 mkhlud...@griddynamics.com wrote:

 Replied. pls check maillist.



 On Tue, Nov 13, 2012 at 11:44 AM, Mikhail Khludnev 
 mkhlud...@griddynamics.com wrote:

 Gerald,

 I wonder if you tried to approach BlockJoin for your problem? Can you
 afford less frequent updates?


 On Wed, Nov 7, 2012 at 5:40 PM, Gerald Blanck 
 gerald.bla...@barometerit.com wrote:

 Thank you Erick for your reply.  I understand that search is not an
 RDBMS.
  Yes, we do have a huge combinatorial explosion if we de-normalize and
 duplicate data.  In fact, I believe our use case is exactly what the
 Solr
 developers were trying to solve with the addition of the Join query.
  And
 while the example I gave illustrates the problem we are solving with the
 Join functionality, it is simplistic in nature compared to what we have
 in
 actuality.

 Am still looking for an answer here if someone can shed some light.
  Thanks.


 On Sat, Nov 3, 2012 at 9:38 PM, Erick Erickson erickerick...@gmail.com
 wrote:

  I'm going to go a bit sideways on you, partly because I can't answer
 the
  question G...
 
  But, every time I see someone doing what looks like substituting
 core for
  table and
  then trying to use Solr like a DB, I get on my soap-box and
 preach..
 
  In this case, consider de-normalizing your DB so you can ask the
 query in
  terms
  of search rather than joins. e.g.
 
  Make each document a combination of the author and the book, with an
  additional
  field author_has_written_a_bestseller. Now your query becomes a
 really
  simple
  search, author:name AND author_has_written_a_bestseller:true. True,
 this
  kind
  of approach isn't as flexible as an RDBMS, but it's a _search_ rather
 than
  a query.
  Yes, it replicates data, but unless you have a huge combinatorial
  explosion, that's
  not a problem.
 
  And the join functionality isn't called pseudo for nothing. It was
  written for a specific
  use-case. It is often expensive, especially when the field being
 joined has
  many unique
  values.
 
  FWIW,
  Erick
 
 
  On Fri, Nov 2, 2012 at 11:32 AM, Gerald Blanck 
  gerald.bla...@barometerit.com wrote:
 
   At a high level, I have a need to be able to execute a query that
 joins
   across cores, and that query during its joining may join back to the
   originating core.
  
   Example:
   Find all Books written by an Author who has written a best selling
 Book.
  
   In Solr query syntax
   A) against the book core - bestseller:true
   B) against the author core - {!join fromIndex=book from=id
   to=bookid}bestseller:true
   C) against the book core - {!join fromIndex=author from=id
   to=authorid}{!join fromIndex=book from=id to=bookid}bestseller:true
  
   A - returns results
   B - returns results
   C - does not return results
  
   Given that A and C use the same core, I started looking for join
 code
  that
   compares the originating core to the fromIndex and found this
   in JoinQParserPlugin (line #159).
  
   if (info.getReq().getCore() == fromCore) {
  
 // if this is the same core, use the searcher passed in...
   otherwise we could be warming and
  
 // get an older searcher from the core.
  
 fromSearcher = searcher;
  
   } else {
  
 // This could block if there is a static warming query
 with a
   join in it, and if useColdSearcher is true.
  
 // Deadlock could result if two cores both had
 useColdSearcher
   and had joins that used eachother.
  
 // This would be very predictable though (should happen
 every
   time if misconfigured)
  
 fromRef = fromCore.getSearcher(false, true, null);
  
  
 // be careful not to do anything with this searcher that
  requires
   the thread local
  
 // SolrRequestInfo in a manner that requires the core in
 the
   request to match
  
 fromSearcher = fromRef.get();
  
   }
  
   I found that if I were to modify the above code so that it always
 follows
   the logic in the else block, I get the results I expect.
  
   Can someone explain to me why the code is written as it is?  And if
 we
  were
   to run with only the else block being executed, what type of adverse
   impacts we might have?
  
   Does anyone have other ideas on how to solve this issue?
  
   Thanks in advance.
   -Gerald
  
 



 --

 *Gerald Blanck*

 baro*m*eter*IT*

 1331 Tyler Street NE, Suite 100
 Minneapolis, MN 55413


 612.208.2802

 gerald.bla...@barometerit.com




 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 

Re: java.io.IOException: Map failed :: OutOfMemory

2012-11-13 Thread uwe72
Kernel: 2.6.32.29-0.3-default #1 SMP 2011-02-25 13:36:59 +0100 x86_64
x86_64 x86_64 GNU/Linux

SUSE Linux Enterprise Server 11 SP1  (x86_64)

physical Memory: 4 GB

portadm@smtcax0033:/srv/connect/tomcat/instances/SYSTEST_Portal_01/bin
java -version
java version 1.6.0_33
Java(TM) SE Runtime Environment (build 1.6.0_33-b03) Java HotSpot(TM) 64-Bit
Server VM (build 20.8-b03, mixed mode)




--
View this message in context: 
http://lucene.472066.n3.nabble.com/java-io-IOException-Map-failed-OutOfMemory-tp4019802p4020078.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr4.0 problem zkHost with multiple hosts throws out of range exception

2012-11-13 Thread Mark Miller
On Tue, Nov 13, 2012 at 12:22 AM, deniz denizdurmu...@gmail.com wrote:
 so do we need to add one of the servers from the -DzkHost string to -DzkRun?
 should it look like

 -DzkRun=host1:port -DzkHost=host:port, host1:port, host2:port in the
 start up command?

Yeah, something to that affect.



 and will wiki page be updated? because the example there is still letting
 into the error that was mentioned here nearly a month ago...


Yeah, it would be nice if the wiki pointed this out. It shouldn't be
necessarily required for the example because it should work with
localhost with just zkRun - but that does set you up for failure when
you move to multiple machines - so the wiki should point it out.


-- 
- Mark


Re: solr4.0 problem zkHost with multiple hosts throws out of range exception

2012-11-13 Thread Mark Miller
On Tue, Nov 13, 2012 at 12:22 AM, deniz denizdurmu...@gmail.com wrote:
 so do we need to add one of the servers from the -DzkHost string to -DzkRun?

By the way - not just any of the servers has to be added to zkRun -
but the address for the current server - that is, the server you are
running the command on. This is so we know which of the zk address
belongs to the localhost. It lets us handle some zookeeper setup for
you (specifying a myid for each node).


RE: Removing Shards from Zookeeper - no servers hosting shard

2012-11-13 Thread Gilles Comeau
When I do the unload through the UI, I see the below messages in the solr log.  
 Nothing in the zookeeper log.



Then right after I try:  
http://217.147.83.124:9090/solr/experiment_master/select?q=*%3A*wt=xmldistrib=true
  and get  str name=msgno servers hosting shard:/str.   Also, I still see 
the shard being referenced in the cloud tab in the UI.



[cid:image001.png@01CDC1BB.FD2BE590]



Does this work for anyone else using SOLR 4.0 production with external 
zookeeper and distributed queries and if so, can you let me know exactly what 
versions and steps you take to not get this error? ☺   Anyone else have any 
problems getting this to work?



My setup is pretty basic:  Local external zookeeper  3.3.6, solr 4.0 with three 
cores seen above.



Regards, Gilles



INFO: [02_10_2012_experiment]  CLOSING SolrCore 
org.apache.solr.core.SolrCore@11e3c2c6

13-Nov-2012 16:19:13 org.apache.solr.core.SolrCore closeSearcher

INFO: [02_10_2012_experiment] Closing main searcher on request.

13-Nov-2012 16:19:13 org.apache.solr.search.SolrIndexSearcher close

FINE: Closing Searcher@7cd47880 main


fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=7,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}


filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=1,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}


queryResultCache{lookups=4,hits=3,hitratio=0.75,inserts=2,evictions=0,size=2,warmupTime=0,cumulative_lookups=4,cumulative_hits=3,cumulative_hitratio=0.75,cumulative_inserts=1,cumulative_evictions=0}


documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}

13-Nov-2012 16:19:13 org.apache.solr.core.CachingDirectoryFactory close

FINE: Closing: 
CachedDirorg.apache.lucene.store.MMapDirectory@/solr2/cores/02_10_2012/data/index
 
lockFactory=org.apache.lucene.store.NativeFSLockFactory@717757ad;refCount=1;path=/solr2/cores/02_10_2012/data/index;done=false

13-Nov-2012 16:19:13 org.apache.solr.update.DirectUpdateHandler2 close

INFO: closing DirectUpdateHandler2{commits=0,autocommits=0,soft 
autocommits=0,optimizes=0,rollbacks=0,expungeDeletes=0,docsPending=0,adds=0,deletesById=0,deletesByQuery=0,errors=0,cumulative_adds=0,cumulative_deletesById=0,cumulative_deletesByQuery=0,cumulative_errors=0}

13-Nov-2012 16:19:13 org.apache.solr.update.DefaultSolrCoreState decref

INFO: SolrCoreState ref count has reached 0 - closing IndexWriter

13-Nov-2012 16:19:13 org.apache.solr.update.DefaultSolrCoreState decref

INFO: Closing SolrCoreState - canceling any ongoing recovery

13-Nov-2012 16:19:13 org.apache.solr.core.CoreContainer persistFile

INFO: Persisting cores config to /solr2/solr.xml

13-Nov-2012 16:19:13 org.apache.solr.core.Config getVal

FINE: null solr/cores/@adminPath=/admin/cores

13-Nov-2012 16:19:13 org.apache.solr.core.Config getNode

FINE: null missing optional solr/cores/@shareSchema

13-Nov-2012 16:19:13 org.apache.solr.core.Config getVal

FINE: null solr/cores/@hostPort=9090

13-Nov-2012 16:19:13 org.apache.solr.core.Config getVal

FINE: null solr/cores/@zkClientTimeout=1

13-Nov-2012 16:19:13 org.apache.solr.core.Config getVal

FINE: null solr/cores/@hostContext=solr

13-Nov-2012 16:19:13 org.apache.solr.core.Config getNode

FINE: null missing optional solr/cores/@leaderVoteWait

13-Nov-2012 16:19:13 org.apache.solr.core.SolrXMLSerializer persistFile

INFO: Persisting cores config to /solr2/solr.xml

13-Nov-2012 16:19:13 org.apache.solr.common.cloud.ZkStateReader 
updateClusterState

INFO: Updating cloud state from ZooKeeper...

13-Nov-2012 16:19:13 org.apache.solr.common.cloud.ZkStateReader$2 process

INFO: A cluster state change has occurred - updating...



-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com]
Sent: 13 November 2012 14:13
To: solr-user@lucene.apache.org
Subject: Re: Removing Shards from Zookeeper - no servers hosting shard



Odd...the unload command should be enough...



On Tue, Nov 13, 2012 at 5:26 AM, Gilles Comeau 
gilles.com...@polecat.comailto:gilles.com...@polecat.co wrote:

 Hi all,



 We've just updated to SOLR 4.0 production and Zookeeper 3.3.6 from SOLR 4.0 
 development version circa November 2011.  We keep 6 months of data online in 
 our primary cluster, and archive off old stuff to a slower disk archive 
 cluster.   We used to remove SOLR cores with the following code, but 
 everything has changed in Zookeeper now.



 Old code to remove cores from Zookeeper:





 curl 
 

RE: Removing Shards from Zookeeper - no servers hosting shard

2012-11-13 Thread Gilles Comeau
Sorry forgot.. pictures are no good.. From cluster.json, the same information, 
the core I unloaded shard sticks around:  
“solrexperiment:8080_solr_experiment_02_10_2012:{replicas:{”

Do I need a special command to delete the shard or something?  I’ve never seen 
a command that does that?

Regards, Gilles

  experiment:{

solrexperiment:8080_solr_experiment_master:{replicas:{IS-17093:9090_solr_experiment_master:{
  shard:solrexperiment:8080_solr_experiment_master,
  roles:null, 
state:active,core:experiment_master,collection:experiment,node_name:IS-17093:9090_solr,base_url:http://IS-17093:9090/solr,leader:true}}},

solrexperiment:8080_solr_experiment_01_10_2012:{replicas:{IS-17093:9090_solr_01_10_2012_experiment:{
  
shard:solrexperiment:8080_solr_experiment_01_10_2012,roles:null,state:active,core:01_10_2012_experiment,
  
collection:experiment,node_name:IS-17093:9090_solr,base_url:http://IS-17093:9090/solr,leader:true}}},
solrexperiment:8080_solr_experiment_02_10_2012:{replicas:{


From: Gilles Comeau [mailto:gilles.com...@polecat.co]
Sent: 13 November 2012 16:29
To: solr-user@lucene.apache.org; markrmil...@gmail.com
Subject: RE: Removing Shards from Zookeeper - no servers hosting shard


When I do the unload through the UI, I see the below messages in the solr log.  
 Nothing in the zookeeper log.



Then right after I try:  
http://217.147.83.124:9090/solr/experiment_master/select?q=*%3A*wt=xmldistrib=true
  and get  str name=msgno servers hosting shard:/str.   Also, I still see 
the shard being referenced in the cloud tab in the UI.



[cid:image001.png@01CDC1BB.FD2BE590]



Does this work for anyone else using SOLR 4.0 production with external 
zookeeper and distributed queries and if so, can you let me know exactly what 
versions and steps you take to not get this error? ☺   Anyone else have any 
problems getting this to work?



My setup is pretty basic:  Local external zookeeper  3.3.6, solr 4.0 with three 
cores seen above.



Regards, Gilles



INFO: [02_10_2012_experiment]  CLOSING SolrCore 
org.apache.solr.core.SolrCore@11e3c2c6mailto:org.apache.solr.core.SolrCore@11e3c2c6

13-Nov-2012 16:19:13 org.apache.solr.core.SolrCore closeSearcher

INFO: [02_10_2012_experiment] Closing main searcher on request.

13-Nov-2012 16:19:13 org.apache.solr.search.SolrIndexSearcher close

FINE: Closing Searcher@7cd47880 main


fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=7,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}


filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=1,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}


queryResultCache{lookups=4,hits=3,hitratio=0.75,inserts=2,evictions=0,size=2,warmupTime=0,cumulative_lookups=4,cumulative_hits=3,cumulative_hitratio=0.75,cumulative_inserts=1,cumulative_evictions=0}


documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}

13-Nov-2012 16:19:13 org.apache.solr.core.CachingDirectoryFactory close

FINE: Closing: 
CachedDirorg.apache.lucene.store.MMapDirectory@/solr2/cores/02_10_2012/data/index
 
lockFactory=org.apache.lucene.store.NativeFSLockFactory@717757ad;refCount=1;path=/solr2/cores/02_10_2012/data/index;done=falsemailto:org.apache.lucene.store.MMapDirectory@/solr2/cores/02_10_2012/data/index%20lockFactory=org.apache.lucene.store.NativeFSLockFactory@717757ad;refCount=1;path=/solr2/cores/02_10_2012/data/index;done=false

13-Nov-2012 16:19:13 org.apache.solr.update.DirectUpdateHandler2 close

INFO: closing DirectUpdateHandler2{commits=0,autocommits=0,soft 
autocommits=0,optimizes=0,rollbacks=0,expungeDeletes=0,docsPending=0,adds=0,deletesById=0,deletesByQuery=0,errors=0,cumulative_adds=0,cumulative_deletesById=0,cumulative_deletesByQuery=0,cumulative_errors=0}

13-Nov-2012 16:19:13 org.apache.solr.update.DefaultSolrCoreState decref

INFO: SolrCoreState ref count has reached 0 - closing IndexWriter

13-Nov-2012 16:19:13 org.apache.solr.update.DefaultSolrCoreState decref

INFO: Closing SolrCoreState - canceling any ongoing recovery

13-Nov-2012 16:19:13 org.apache.solr.core.CoreContainer persistFile

INFO: Persisting cores config to /solr2/solr.xml

13-Nov-2012 16:19:13 org.apache.solr.core.Config getVal

FINE: null solr/cores/@adminPath=/admin/cores

13-Nov-2012 16:19:13 org.apache.solr.core.Config getNode

FINE: null missing optional solr/cores/@shareSchema

13-Nov-2012 16:19:13 org.apache.solr.core.Config getVal

FINE: null solr/cores/@hostPort=9090

13-Nov-2012 16:19:13 org.apache.solr.core.Config getVal

FINE: null solr/cores/@zkClientTimeout=1

13-Nov-2012 16:19:13 org.apache.solr.core.Config getVal

FINE: null 

Re: Testing Solr Cloud with ZooKeeper

2012-11-13 Thread darul
https://issues.apache.org/jira/browse/SOLR-3993 has been resolved.

Just few question, is it in trunk, I mean in main distrib downloadable on
main solr site.

Because I have downloaded it and get still same behaviour while running
first instance..or second shards.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Testing-Solr-Cloud-with-ZooKeeper-tp4018900p4020118.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: java.io.IOException: Map failed :: OutOfMemory

2012-11-13 Thread uwe72
today the same exception:

INFO: [] webapp=/solr path=/update
params={waitSearcher=truecommit=truewt=javabinwaitFlush=trueversion=2}
status=0 QTime=1009 
Nov 13, 2012 2:02:27 PM org.apache.solr.core.SolrDeletionPolicy onInit
INFO: SolrDeletionPolicy.onInit: commits:num=1

commit{dir=/net/smtcax0033/connect/Portal/solr-home/data/index,segFN=segments_3gm,version=1352803609067,generation=4486,filenames=[_21c.fdt,
_4mv.tis, _4mh.fnm, _1si.fdt, _4n0.fdx, _4mx.nrm, _1si.fdx, _2n0.nrm,
_2n0.prx, _4mv.tii, _3ii.fnm, _4mz.tvd, _4mv.nrm, _2ie.frq, _1l9.fnm,
_4my.fnm, _21c.fdx, _308.tvd, _4mz.tvf, _308.tvf, _sc.tis, _4mw.tii,
_4n1.fnm, _4mv.fdt, _1o2.nrm, _1si.nrm, _4mw.fdt, _it.tvf, _4mv.fdx,
_sc.tii, _4mw.tis, _4mw.fdx, _37y.tvx, _4mz.tvx, _4mh.nrm, _1si.prx,
_1o2.prx, _it.tvx, _3ii.tis, _3yn.nrm, _43w.tii, _37y.tvd, _3yn.prx,
_308.prx, _cv.nrm, _37y.tvf, _1b9.nrm, _3xp.frq, _43w.tis, _4mf.tvf,
_4mf.tvd, _1b9.fdt, _4ag.fdt, _1b9.fdx, _4mz.frq, _4ag.fdx, _418.tvx,
_4mf.tvx, _418.frq, _473.tis, _3ii.nrm, _4mx.fnm, _cv.frq, _3yn.tvd,
_418.tvd, _3yn.tvf, _418.tvf, _2ie.tvf, _2ie.tvd, _sc.frq, _1b9.frq,
_4ag.nrm, _37y.tii, _cv.prx, _4mx.tis, _4ag.prx, _2ie.tvx, _2n0.fdx,
_4mx.tii, _4mh.prx, _4my.prx, _4mz.nrm, _4lc.prx, _2ie.nrm, _3yn.tis,
_4n0.tii, _4mw.prx, _3yn.tvx, _it.fnm, _2n0.fdt, _4ag.frq, _21c.tvf,
_21c.tvd, _21c.nrm, _43w.prx, _308.fdt, _4my.frq, _1si.tvx, _4n3.prx,
_3yn.tii, _37y.tis, _4dj.fdt, _473.frq, _1l9.prx, _2ie.fnm, _4dj.fdx,
_308.fdx, _473.tvx, _cv.fdx, _4mz.tii, _473.tii, _cv.fdt, _3xp.tii,
_4lc.nrm, _2em.fnm, _it.tis, _418.fdx, _4n3.fdx, _3xp.tis, _418.fdt,
_1ih.fdx, _it.tii, _4n3.fdt, _4ix.tis, _1ih.fdt, _4lc.fdt, _4ix.tii,
_4mz.tis, _1b9.prx, _4n0.tis, _4lc.fdx, _473.tvd, _1ih.nrm, _2n0.frq,
_473.tvf, _4mz.fdx, _sc.fdx, _it.nrm, _4mz.fdt, _4my.tvx, _4mx.tvf,
_3ii.tii, _1b9.tvf, _4mx.tvd, _1b9.tvd, _418.prx, _3ii.tvx, _3xp.fnm,
_4mv.tvx, _sc.fdt, _sc.prx, segments_3gm, _418.fnm, _2n0.tii, _4mf.tis,
_sc.nrm, _4mf.tii, _4dj.nrm, _3ii.tvd, _1ih.frq, _3ii.tvf, _4n1.prx,
_1o2.tii, _37y.frq, _2em.prx, _4n3.frq, _4ix.fdt, _473.fdt, _21c.prx,
_1o2.tvx, _3xp.nrm, _473.fdx, _sc.fnm, _2n0.tis, _43w.fdt, _4mf.fnm,
_4ix.fdx, _43w.fdx, _4dj.tis, _473.nrm, _4my.tvf, _4mx.tvx, _4mv.tvd,
_1o2.tvd, _4my.tvd, _1o2.tvf, _4dj.tii, _4mv.frq, _1si.tvf, _4mv.tvf,
_1si.tvd, _473.fnm, _4ix.frq, _cv.tvx, _4dj.tvd, _21c.tii, _473.prx,
_4n1.tvx, _1ih.tvx, _1si.tis, _cv.tvf, _4ag.fnm, _1b9.tvx, _1ih.tvf,
_1l9.fdx, _4lc.tii, _1ih.tvd, _4n1.fdx, _4lc.tis, _1l9.fdt, _21c.tis,
_4dj.tvf, _1si.tii, _4n1.fdt, _4n0.fnm, _cv.tvd, _it.frq, _4mv.prx,
_4mh.tis, _3xp.tvf, _4n0.tvf, _3xp.tvd, _4n0.tvd, _4mx.fdx, _4my.nrm,
_4dj.frq, _4mx.fdt, _43w.frq, _1o2.frq, _4n0.tvx, _it.tvd, _1si.fnm,
_4n3.tvx, _3xp.tvx, _4mz.prx, _4my.tis, _21c.tvx, _37y.prx, _1ih.tii,
_4ix.prx, _4mh.fdt, _2n0.fnm, _4n3.tvf, _21c.fnm, _4mh.fdx, _2em.tvx,
_1b9.tii, _308.frq, _4mx.prx, _37y.fdx, _3yn.fnm, _4n3.tvd, _4mh.tii,
_4ag.tis, _4my.tii, _1b9.tis, _2ie.prx, _1ih.prx, _4ag.tii, _4n1.tvd,
_1ih.fnm, _3ii.prx, _4ix.nrm, _4n1.tvf, _4n1.nrm, _2em.tvd, _4mv.fnm,
_4mw.fnm, _37y.nrm, _it.fdx, _4mf.frq, _4n0.nrm, _3ii.frq, _it.fdt,
_1o2.tis, _37y.fdt, _4dj.tvx, _4n3.fnm, _4lc.fnm, _4my.fdt, _4lc.frq,
_2em.tvf, _4my.fdx, _37y.fnm, _4n0.prx, _1l9.tvd, _418.nrm, _2em.tis,
_4mw.nrm, _3xp.prx, _2ie.tis, _3xp.fdx, _1l9.frq, _1l9.tvf, _4mf.nrm,
_2em.tii, _4ix.fnm, _3xp.fdt, _4mh.tvd, _4mh.tvf, _2ie.tii, _1o2.fdt,
_4mh.tvx, _4mf.fdt, _4n0.frq, _308.tii, _4mw.tvx, _4ag.tvx, _308.tis,
_4n1.frq, _4mf.fdx, _sc.tvd, _sc.tvf, _3yn.fdt, _4mw.tvf, _4ag.tvf,
_4mw.tvd, _3yn.fdx, _1o2.fdx, _43w.fnm, _1o2.fnm, _4ag.tvd, _1si.frq,
_sc.tvx, _cv.tis, _4dj.fnm, _4mh.frq, _1ih.tis, _4lc.tvf, _2em.fdt,
_4lc.tvd, _2em.frq, _4ix.tvd, _21c.frq, _3ii.fdt, _2em.fdx, _4ix.tvf,
_4n1.tis, _cv.tii, _4mz.fnm, _308.tvx, _4dj.prx, _4lc.tvx, _43w.tvf,
_308.fnm, _3yn.frq, _43w.tvd, _43w.nrm, _it.prx, _4mx.frq, _cv.fnm,
_2n0.tvx, _1l9.tii, _4n0.fdt, _418.tis, _418.tii, _1l9.tis, _4n3.nrm,
_1l9.nrm, _4mw.frq, _4mf.prx, _4ix.tvx, _1l9.tvx, _2ie.fdx, _1b9.fnm,
_43w.tvx, _2n0.tvd, _4n3.tii, _2n0.tvf, _3ii.fdx, _4n1.tii, _2em.nrm,
_4n3.tis, _308.nrm, _2ie.fdt]
Nov 13, 2012 2:02:27 PM org.apache.solr.core.SolrDeletionPolicy
updateCommits
INFO: newest commit = 1352803609067
Nov 13, 2012 2:02:27 PM org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: {add=[SingleCoreWireImpl@3005994, SingleCoreWireImpl@3005997,
SingleCoreWireImpl@3005996, SingleCoreWireImpl@3005999,
SingleCoreWireImpl@3005998, SingleCoreWireImpl@3005985,
SingleCoreWireImpl@3005984, SingleCoreWireImpl@3005987, ... (500 adds)]} 0
85
Nov 13, 2012 2:02:27 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/update params={wt=javabinversion=2} status=0
QTime=85 
Nov 13, 2012 2:02:27 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start
commit(optimize=false,waitFlush=true,waitSearcher=true,expungeDeletes=false)
Exception in thread Lucene Merge Thread #0
org.apache.lucene.index.MergePolicy$MergeException: 

SolrCloudServer and SolrServerException No live SolrServers available

2012-11-13 Thread iwo
Hi,
   I'm using solr 4 (4.0.0.2012.03.17.15.05.35) with cloud architecture and
I would use SolrCloudServer from solrJ, but I received a
SolrServerException.

org.apache.solr.client.solrj.SolrServerException: No live SolrServers
available to handle this request
at
org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:322)
at
org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:257)

This is my Junit test code

CloudSolrServer server = new CloudSolrServer(myEndPointZkHost);
MapString, String map = new HashMapString, String();
map.put(collection, myCollectionName);
map.put(q, *:*);
SolrParams q = new MapSolrParams(map);
SolrRequest request = new QueryRequest(q );
NamedList responseList = server.request(request);

and this is ZkStatus

live nodes:
[search01:8889_solr, search02:8889_solr, search01:_solr,
search02:_solr] 
collections:{
myCollectionName={
shard1=shard1:{
  search01:_solr_myCollectionName:{
shard:shard1,
leader:true,
state:active,
core:myCollectionName,
collection:myCollectionName,
node_name:search01:_solr,
base_url:http://search01:/solr},
  search02:8889_solr_myCollectionName:{
shard:shard1,
state:active,
core:myCollectionName,
collection:myCollectionName,
node_name:search02:8889_solr,
base_url:http://search02:8889/solr},
  replicas:{}}, 
shard2=shard2:{
  search01:8889_solr_myCollectionName:{
shard:shard2,
leader:true,
state:active,
core:myCollectionName,
collection:myCollectionName,
node_name:search01:8889_solr,
base_url:http://search01:8889/solr},
  search02:_solr_myCollectionName:{
shard:shard2,
state:active,
core:myCollectionName,
collection:myCollectionName,
node_name:search02:_solr,
base_url:http://search02:/solr},
  replicas:{}}
}


I'm doing something wrong? 
Thanks





--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloudServer-and-SolrServerException-No-live-SolrServers-available-tp4020091.html
Sent from the Solr - User mailing list archive at Nabble.com.


AW: java.io.IOException: Map failed :: OutOfMemory

2012-11-13 Thread André Widhani
I just saw that you are running on SUSE 11 - unlike RHEL for example, it does 
not have virtual memory set to unlimited by default. 

Please check is the virtual memory limit (ulimit -v, check this for the 
operating system user that runs Tomcat /Solr).

Since 3.1, Solr maps the index files to virtual memory. So if the size of your 
index files are larger than the allowed virtual memory, it is likely that you 
get these kind of exceptions.

Regards,
André


Von: uwe72 [uwe.clem...@exxcellent.de]
Gesendet: Dienstag, 13. November 2012 17:58
An: solr-user@lucene.apache.org
Betreff: Re: java.io.IOException: Map failed :: OutOfMemory

today the same exception:

INFO: [] webapp=/solr path=/update
params={waitSearcher=truecommit=truewt=javabinwaitFlush=trueversion=2}
status=0 QTime=1009
Nov 13, 2012 2:02:27 PM org.apache.solr.core.SolrDeletionPolicy onInit
INFO: SolrDeletionPolicy.onInit: commits:num=1

commit{dir=/net/smtcax0033/connect/Portal/solr-home/data/index,segFN=segments_3gm,version=1352803609067,generation=4486,filenames=[_21c.fdt,
_4mv.tis, _4mh.fnm, _1si.fdt, _4n0.fdx, _4mx.nrm, _1si.fdx, _2n0.nrm,
_2n0.prx, _4mv.tii, _3ii.fnm, _4mz.tvd, _4mv.nrm, _2ie.frq, _1l9.fnm,
_4my.fnm, _21c.fdx, _308.tvd, _4mz.tvf, _308.tvf, _sc.tis, _4mw.tii,
_4n1.fnm, _4mv.fdt, _1o2.nrm, _1si.nrm, _4mw.fdt, _it.tvf, _4mv.fdx,
_sc.tii, _4mw.tis, _4mw.fdx, _37y.tvx, _4mz.tvx, _4mh.nrm, _1si.prx,
_1o2.prx, _it.tvx, _3ii.tis, _3yn.nrm, _43w.tii, _37y.tvd, _3yn.prx,
_308.prx, _cv.nrm, _37y.tvf, _1b9.nrm, _3xp.frq, _43w.tis, _4mf.tvf,
_4mf.tvd, _1b9.fdt, _4ag.fdt, _1b9.fdx, _4mz.frq, _4ag.fdx, _418.tvx,
_4mf.tvx, _418.frq, _473.tis, _3ii.nrm, _4mx.fnm, _cv.frq, _3yn.tvd,
_418.tvd, _3yn.tvf, _418.tvf, _2ie.tvf, _2ie.tvd, _sc.frq, _1b9.frq,
_4ag.nrm, _37y.tii, _cv.prx, _4mx.tis, _4ag.prx, _2ie.tvx, _2n0.fdx,
_4mx.tii, _4mh.prx, _4my.prx, _4mz.nrm, _4lc.prx, _2ie.nrm, _3yn.tis,
_4n0.tii, _4mw.prx, _3yn.tvx, _it.fnm, _2n0.fdt, _4ag.frq, _21c.tvf,
_21c.tvd, _21c.nrm, _43w.prx, _308.fdt, _4my.frq, _1si.tvx, _4n3.prx,
_3yn.tii, _37y.tis, _4dj.fdt, _473.frq, _1l9.prx, _2ie.fnm, _4dj.fdx,
_308.fdx, _473.tvx, _cv.fdx, _4mz.tii, _473.tii, _cv.fdt, _3xp.tii,
_4lc.nrm, _2em.fnm, _it.tis, _418.fdx, _4n3.fdx, _3xp.tis, _418.fdt,
_1ih.fdx, _it.tii, _4n3.fdt, _4ix.tis, _1ih.fdt, _4lc.fdt, _4ix.tii,
_4mz.tis, _1b9.prx, _4n0.tis, _4lc.fdx, _473.tvd, _1ih.nrm, _2n0.frq,
_473.tvf, _4mz.fdx, _sc.fdx, _it.nrm, _4mz.fdt, _4my.tvx, _4mx.tvf,
_3ii.tii, _1b9.tvf, _4mx.tvd, _1b9.tvd, _418.prx, _3ii.tvx, _3xp.fnm,
_4mv.tvx, _sc.fdt, _sc.prx, segments_3gm, _418.fnm, _2n0.tii, _4mf.tis,
_sc.nrm, _4mf.tii, _4dj.nrm, _3ii.tvd, _1ih.frq, _3ii.tvf, _4n1.prx,
_1o2.tii, _37y.frq, _2em.prx, _4n3.frq, _4ix.fdt, _473.fdt, _21c.prx,
_1o2.tvx, _3xp.nrm, _473.fdx, _sc.fnm, _2n0.tis, _43w.fdt, _4mf.fnm,
_4ix.fdx, _43w.fdx, _4dj.tis, _473.nrm, _4my.tvf, _4mx.tvx, _4mv.tvd,
_1o2.tvd, _4my.tvd, _1o2.tvf, _4dj.tii, _4mv.frq, _1si.tvf, _4mv.tvf,
_1si.tvd, _473.fnm, _4ix.frq, _cv.tvx, _4dj.tvd, _21c.tii, _473.prx,
_4n1.tvx, _1ih.tvx, _1si.tis, _cv.tvf, _4ag.fnm, _1b9.tvx, _1ih.tvf,
_1l9.fdx, _4lc.tii, _1ih.tvd, _4n1.fdx, _4lc.tis, _1l9.fdt, _21c.tis,
_4dj.tvf, _1si.tii, _4n1.fdt, _4n0.fnm, _cv.tvd, _it.frq, _4mv.prx,
_4mh.tis, _3xp.tvf, _4n0.tvf, _3xp.tvd, _4n0.tvd, _4mx.fdx, _4my.nrm,
_4dj.frq, _4mx.fdt, _43w.frq, _1o2.frq, _4n0.tvx, _it.tvd, _1si.fnm,
_4n3.tvx, _3xp.tvx, _4mz.prx, _4my.tis, _21c.tvx, _37y.prx, _1ih.tii,
_4ix.prx, _4mh.fdt, _2n0.fnm, _4n3.tvf, _21c.fnm, _4mh.fdx, _2em.tvx,
_1b9.tii, _308.frq, _4mx.prx, _37y.fdx, _3yn.fnm, _4n3.tvd, _4mh.tii,
_4ag.tis, _4my.tii, _1b9.tis, _2ie.prx, _1ih.prx, _4ag.tii, _4n1.tvd,
_1ih.fnm, _3ii.prx, _4ix.nrm, _4n1.tvf, _4n1.nrm, _2em.tvd, _4mv.fnm,
_4mw.fnm, _37y.nrm, _it.fdx, _4mf.frq, _4n0.nrm, _3ii.frq, _it.fdt,
_1o2.tis, _37y.fdt, _4dj.tvx, _4n3.fnm, _4lc.fnm, _4my.fdt, _4lc.frq,
_2em.tvf, _4my.fdx, _37y.fnm, _4n0.prx, _1l9.tvd, _418.nrm, _2em.tis,
_4mw.nrm, _3xp.prx, _2ie.tis, _3xp.fdx, _1l9.frq, _1l9.tvf, _4mf.nrm,
_2em.tii, _4ix.fnm, _3xp.fdt, _4mh.tvd, _4mh.tvf, _2ie.tii, _1o2.fdt,
_4mh.tvx, _4mf.fdt, _4n0.frq, _308.tii, _4mw.tvx, _4ag.tvx, _308.tis,
_4n1.frq, _4mf.fdx, _sc.tvd, _sc.tvf, _3yn.fdt, _4mw.tvf, _4ag.tvf,
_4mw.tvd, _3yn.fdx, _1o2.fdx, _43w.fnm, _1o2.fnm, _4ag.tvd, _1si.frq,
_sc.tvx, _cv.tis, _4dj.fnm, _4mh.frq, _1ih.tis, _4lc.tvf, _2em.fdt,
_4lc.tvd, _2em.frq, _4ix.tvd, _21c.frq, _3ii.fdt, _2em.fdx, _4ix.tvf,
_4n1.tis, _cv.tii, _4mz.fnm, _308.tvx, _4dj.prx, _4lc.tvx, _43w.tvf,
_308.fnm, _3yn.frq, _43w.tvd, _43w.nrm, _it.prx, _4mx.frq, _cv.fnm,
_2n0.tvx, _1l9.tii, _4n0.fdt, _418.tis, _418.tii, _1l9.tis, _4n3.nrm,
_1l9.nrm, _4mw.frq, _4mf.prx, _4ix.tvx, _1l9.tvx, _2ie.fdx, _1b9.fnm,
_43w.tvx, _2n0.tvd, _4n3.tii, _2n0.tvf, _3ii.fdx, _4n1.tii, _2em.nrm,
_4n3.tis, _308.nrm, _2ie.fdt]
Nov 13, 2012 2:02:27 PM org.apache.solr.core.SolrDeletionPolicy
updateCommits
INFO: newest commit = 1352803609067
Nov 13, 2012 2:02:27 PM org.apache.solr.update.processor.LogUpdateProcessor
finish

Re: AW: java.io.IOException: Map failed :: OutOfMemory

2012-11-13 Thread uwe72
Thanks Andrew!

Parallel i also found this thread:
http://grokbase.com/t/lucene/solr-user/117m8e9n8t/solr-3-3-exception-in-thread-lucene-merge-thread-1

they are talking about the same

We just started the importer again, with the unlimited-flag (/ulimit -v
unlimited /), then we will see.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/java-io-IOException-Map-failed-OutOfMemory-tp4019802p4020134.html
Sent from the Solr - User mailing list archive at Nabble.com.


Searchers, threads and performance

2012-11-13 Thread Andy Lester
We're getting close to deploying our Solr search solution, and we're doing 
performance testing, and we've run into some questions and concerns.

Our number one problem: Doing a commit from loading records, which can happen 
throughout the day, makes all queries stop for 5-7 seconds.  This is a 
showstopper for deployment.

Here's what we've observed: Upon commit, Solr finishes processing queries in 
flight, starts up a new searcher, warms it, shuts down the old searcher and 
puts the new searcher into effect. Does the old searcher stop taking requests 
before the new searcher is warmed or after? How wide is the window of time 
wherein Solr is not serving requests?  For us, it's about five seconds and we 
need to drop that dramatically.  In general, what is the difference between 
accepting the delay of waiting for warming vs. accepting the delay of running 
useColdSearcher=true?

Is there any such thing as/any sense in running more than one searcher in our 
scenario?  What are the benefits of multiple searchers?  Erik Erikson posts in 
2012: Unless you have warming happening, there should only be a single 
searcher open at any given time. Except: If your queries run across several 
commits you'll get multiple searchers open. Not sure if this is a general 
observation, or specific to the particular poster's situation.

Finally, what do people mean when they blog that they have Solr set up for n 
threads? Is that the same thing as saying that Solr can be processing n 
requests simultaneously?

Thanks for any insight or even links to relevant pages.  We've been Googling 
all over and haven't found answers to the above.

Thanks,
Andy

--
Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance



Searchers, threads and performance

2012-11-13 Thread Andy Lester
We're getting close to deploying our Solr search solution, and we're doing 
performance testing, and we've run into some questions and concerns.

Our number one problem: Doing a commit from loading records, which can happen 
throughout the day, makes all queries stop for 5-7 seconds.  This is a 
showstopper for deployment.

Here's what we've observed: Upon commit, Solr finishes processing queries in 
flight, starts up a new searcher, warms it, shuts down the old searcher and 
puts the new searcher into effect. Does the old searcher stop taking requests 
before the new searcher is warmed or after? How wide is the window of time 
wherein Solr is not serving requests?  For us, it's about five seconds and we 
need to drop that dramatically.  In general, what is the difference between 
accepting the delay of waiting for warming vs. accepting the delay of running 
useColdSearcher=true?

Is there any such thing as/any sense in running more than one searcher in our 
scenario?  What are the benefits of multiple searchers?  Erik Erikson posts in 
2012: Unless you have warming happening, there should only be a single 
searcher open at any given time. Except: If your queries run across several 
commits you'll get multiple searchers open. Not sure if this is a general 
observation, or specific to the particular poster's situation.

Finally, what do people mean when they blog that they have Solr set up for n 
threads? Is that the same thing as saying that Solr can be processing n 
requests simultaneously?

Thanks for any insight or even links to relevant pages.  We've been Googling 
all over and haven't found answers to the above.

Thanks,
xoa

--
Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance



URL parameters to use FieldAnalysisRequestHandler

2012-11-13 Thread Tom Burton-West
Hello,

I  would like to send a request to the FieldAnalysisRequestHandler.  The
javadoc lists the parameter names such as analysis.field, but sending those
as URL parameters does not seem to work:

mysolr.umich.edu/analysis/field?analysis.name=titleq=fire-fly

leaving out the analysis doesn't work either:

mysolr.umich.edu/analysis/field?name=titleq=fire-fly

No matter what field I specify, the analysis returned is for the default
field. (See repsonse excerpt below)

Is there a page somewhere that shows the correct syntax for sending get
requests to the FieldAnalysisRequestHandler?

Tom


lst name=analysis
lst name=field_types/
lst name=field_names
lst name=ocr


Re: URL parameters to use FieldAnalysisRequestHandler

2012-11-13 Thread Robert Muir
I think the UI uses this behind the scenes, as in no more
analysis.jsp like before?

So maybe try using something like burpsuite and just using the
analysis UI in your browser to see what requests its sending.

On Tue, Nov 13, 2012 at 11:00 AM, Tom Burton-West tburt...@umich.edu wrote:
 Hello,

 I  would like to send a request to the FieldAnalysisRequestHandler.  The
 javadoc lists the parameter names such as analysis.field, but sending those
 as URL parameters does not seem to work:

 mysolr.umich.edu/analysis/field?analysis.name=titleq=fire-fly

 leaving out the analysis doesn't work either:

 mysolr.umich.edu/analysis/field?name=titleq=fire-fly

 No matter what field I specify, the analysis returned is for the default
 field. (See repsonse excerpt below)

 Is there a page somewhere that shows the correct syntax for sending get
 requests to the FieldAnalysisRequestHandler?

 Tom

 
 lst name=analysis
 lst name=field_types/
 lst name=field_names
 lst name=ocr


Re: Searchers, threads and performance

2012-11-13 Thread Mikhail Khludnev
Andy,

Solr is supposed to serve requests by old searcher for a while. If the
pause lasts few seconds you can take a thread dump and see clear what it
waits for.
Just a guess: if you have many threads configured in servlet container pool
and push high load then warming can significantly impact your search
latency - try to limit acceptable load by reducing number of concurrent
requests.
What's cpu utilization, and io stats during in your test? do you watch GC
log, for me GC spike is higly probable than a warming attack.   Are you
sure that you use MMapDirectory on which OS?
Once again:
- thread dump?
- io/vm-stat dump?
- gclog?
- thread pool size at servlet container config?
- directory impl and os?


On Tue, Nov 13, 2012 at 7:20 PM, Andy Lester a...@petdance.com wrote:

 We're getting close to deploying our Solr search solution, and we're doing
 performance testing, and we've run into some questions and concerns.

 Our number one problem: Doing a commit from loading records, which can
 happen throughout the day, makes all queries stop for 5-7 seconds.  This is
 a showstopper for deployment.

 Here's what we've observed: Upon commit, Solr finishes processing queries
 in flight, starts up a new searcher, warms it, shuts down the old searcher
 and puts the new searcher into effect. Does the old searcher stop taking
 requests before the new searcher is warmed or after? How wide is the window
 of time wherein Solr is not serving requests?  For us, it's about five
 seconds and we need to drop that dramatically.  In general, what is the
 difference between accepting the delay of waiting for warming vs. accepting
 the delay of running useColdSearcher=true?

 Is there any such thing as/any sense in running more than one searcher in
 our scenario?  What are the benefits of multiple searchers?  Erik Erikson
 posts in 2012: Unless you have warming happening, there should only be a
 single searcher open at any given time. Except: If your queries run
 across several commits you'll get multiple searchers open. Not sure if
 this is a general observation, or specific to the particular poster's
 situation.

 Finally, what do people mean when they blog that they have Solr set up for
 n threads? Is that the same thing as saying that Solr can be processing n
 requests simultaneously?

 Thanks for any insight or even links to relevant pages.  We've been
 Googling all over and haven't found answers to the above.

 Thanks,
 xoa

 --
 Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


RE: sort by function error

2012-11-13 Thread Kuai, Ben
Hi Yonik

I will give the latest 4.0 release a try. 

Thanks anyway.

Cheers
Ben

From: ysee...@gmail.com [ysee...@gmail.com] on behalf of Yonik Seeley 
[yo...@lucidworks.com]
Sent: Tuesday, November 13, 2012 2:04 PM
To: solr-user@lucene.apache.org
Subject: Re: sort by function error

I can't reproduce this with the example data.  Here's an example of
what I tried:

http://localhost:8983/solr/query?q=*:*sort=geodist(store,-32.123323,108.123323)+ascgroup.field=inStockgroup=true

Perhaps this is an issue that's since been fixed.

-Yonik
http://lucidworks.com


On Mon, Nov 12, 2012 at 11:19 PM, Kuai, Ben ben.k...@sensis.com.au wrote:
 Hi Yonik

 Thanks for the reply.
 My sample query,

 q=cafesort=geodist(geoLocation,-32.123323,108.123323)+ascgroup.field=familyId

 field name=geoLocation type=latLon indexed=true stored=false /
 field name=familyId type=string indexed=true stored=false /

 as long as I remove the group field the query working.

 BTW, I just find out that the version of solr we are using is an old copy of 
 4.0 snapshot before the alpha release. Could that be the problem?  we have 
 some customized parsers so it will take quite some time to upgrade.


 Ben
 
 From: ysee...@gmail.com [ysee...@gmail.com] on behalf of Yonik Seeley 
 [yo...@lucidworks.com]
 Sent: Tuesday, November 13, 2012 6:46 AM
 To: solr-user@lucene.apache.org
 Subject: Re: sort by function error

 On Mon, Nov 12, 2012 at 5:24 AM, Kuai, Ben ben.k...@sensis.com.au wrote:
 more information,  problem only happends when I have both sort by function 
 and grouping in query.

 I haven't been able to duplicate this with a few ad-hoc queries.
 Could you give your complete request (or at least all of the relevant
 grouping and sorting parameters), as well as the field type you are
 grouping on?

 -Yonik
 http://lucidworks.com


Re: URL parameters to use FieldAnalysisRequestHandler

2012-11-13 Thread Tom Burton-West
Thanks Robert,

Somehow I read the doc but still entered the params wrong.  Should have
been analysis.fieldname instead of analysis.name  Works fine now.

Tom

On Tue, Nov 13, 2012 at 2:11 PM, Robert Muir rcm...@gmail.com wrote:

 I think the UI uses this behind the scenes, as in no more
 analysis.jsp like before?

 So maybe try using something like burpsuite and just using the
 analysis UI in your browser to see what requests its sending.

 On Tue, Nov 13, 2012 at 11:00 AM, Tom Burton-West tburt...@umich.edu
 wrote:
  Hello,
 
  I  would like to send a request to the FieldAnalysisRequestHandler.  The
  javadoc lists the parameter names such as analysis.field, but sending
 those
  as URL parameters does not seem to work:
 
  mysolr.umich.edu/analysis/field?analysis.name=titleq=fire-fly
 
  leaving out the analysis doesn't work either:
 
  mysolr.umich.edu/analysis/field?name=titleq=fire-fly
 
  No matter what field I specify, the analysis returned is for the default
  field. (See repsonse excerpt below)
 
  Is there a page somewhere that shows the correct syntax for sending get
  requests to the FieldAnalysisRequestHandler?
 
  Tom
 
  
  lst name=analysis
  lst name=field_types/
  lst name=field_names
  lst name=ocr



Re: Solr v4: Synonyms... better at index time or query time?

2012-11-13 Thread Walter Underwood
Don't use query time synonyms. Explanation here:

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

wunder

On Nov 13, 2012, at 1:25 PM, dm_tim wrote:

 I'm looking at the sample docs for Solr v4 and I noted something in the
 schema.xml file: The field type uses the synonymFilterFactory in the query
 section but has it commented out in the index section. What would the
 trade-offs be to using the synonymFilterFactory in the index section
 instead. I assume that it would be pointless to use it in both sections.
 
 Example below:
 fieldType name=text_general class=solr.TextField
 positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.LowerCaseTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true /
 
filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.LowerCaseTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true /
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
 /fieldType






Re: Solr GC issues - Too many BooleanQuery BooleanClause objects in heap

2012-11-13 Thread Prasanna R
We do have a custom query parser that is responsible for expanding the user
input query into a bunch of prefix, phrase and regular boolean queries in a
manner similar to that done by DisMax.

Analyzing heap with jhat/YourKit is on my list of things to do but I
haven't gotten around to doing it yet. Our big heap size (13G) makes it a
little difficult to do a full blown heap dump analysis.

Thanks a ton for the reply Otis!

Prasanna

On Mon, Nov 12, 2012 at 5:42 PM, Otis Gospodnetic 
otis.gospodne...@gmail.com wrote:

 Hi,

 I've never seen this.  You don't have a custom query parser or anything
 else custom, do you?
 Have you tried dumping and analyzing heap?  YourKit has a 7 day eval, or
 you can use things like jhat, which may be included on your machine already
 (see http://docs.oracle.com/javase/6/docs/technotes/tools/share/jhat.html).

 Otis
 --
 Performance Monitoring - http://sematext.com/spm/index.html


 On Mon, Nov 12, 2012 at 8:35 PM, Prasanna R plistma...@gmail.com wrote:

   We have been using Solr in a custom setup where we generate results for
  user queries by expanding it to a large boolean query consisting of
  multiple prefix queries. There have been some GC issues recently with the
  Old/tenured generation becoming nearly 100% full leading to near constant
  full GC cycles.
 
  We are running Solr 3.1 on servers with 13G of heap. jmap live object
  histogram is as follows:
 
  num #instances #bytes  class name
  --
 1:  27441222 1550723760  [Ljava.lang.Object;
 2:  23546318  879258496  [C
 3:  23813405  762028960  java.lang.String
 4:  22700095  726403040  org.apache.lucene.search.BooleanQuery
 5:  27431515  658356360  java.util.ArrayList
 6:  22911883  549885192
  org.apache.lucene.search.BooleanClause
 7:  21651039  519624936  org.apache.lucene.index.Term
 8:   6876651  495118872
  org.apache.lucene.index.FieldsReader$LazyField
 9:  11354214  363334848  org.apache.lucene.search.PrefixQuery
10:   4281624  137011968  java.util.HashMap$Entry
11:   3466680   83200320  org.apache.lucene.search.TermQuery
12:   1987450   79498000  org.apache.lucene.search.PhraseQuery
13:631994   70148624  [Ljava.util.HashMap$Entry;
  .
 
  I have looked at the Solr cache settings multiple times but am not able
 to
  figure out how/why the high number of BooleanQuery and BooleanClause
 object
  instances stay alive. These objects are live and do not get collected
 even
  when the traffic is disabled and a manual GC is triggered which indicates
  that someone is holding onto references.
 
  Can anyone provide more details on the circumstances under which these
  objects stay alive and/or cached? If they are cached then is the caching
  configurable?
 
  Any and all tips/suggestions/pointers will be much appreciated.
 
  Thanks,
 
  Prasanna
 



Custom Solr indexer/searcher

2012-11-13 Thread Scott Smith
Suppose I have a special data search type (something different than a string or 
numeric value) that I want to integrate into the Solr server.  For example, 
suppose I wanted to implement a KD-tree as a filter that would integrate with 
standard Solr filters and queries.  I might want to say find all of the 
documents in the index with the word 'tree' in them that are within a certain 
distance of a particular document in the KD-tree.  Let me add that I'm not 
really looking for a KD-Tree implementation for Solr; I just assume that a fair 
number of people will know what a KD-tree is and so, have some idea that I'm 
talking about adding a new data type (different than string, long, etc.) that 
Solr will need to be able to index and search with.  It's important that the 
new data type should integrate with the existing standard Solr data types for 
searching purposes.

First, is there a way to build and specify a plugin that provides Solr both the 
indexer and search interfaces and therefore hides the internal details of 
what's going on in the search from Solr so it just thinks it's another search 
type?  Or, would I have to hack Solr in a lot of places to add my custom data 
type in?

Second, if the interface(s) exists to add in a new data type, is there 
documentation (tutorial, examples, etc.) anywhere on how to do this.  Or, is my 
only option to dig into the Solr code?

Mostly, I'm looking for some links or suggestions on where to start looking.  I 
doubt this subject is simple enough to fit into an email post (though I'd be 
happy to be surprised :) ).  You can assume Solr 4.0 if that makes things 
easier.  You can also assume that I have some familiarity with Lucene (though I 
haven't hacked that code either).

Hopefully, I've explained this well enough so that people know what I'm looking 
for.

Cheers

Scott



Re: Testing Solr Cloud with ZooKeeper

2012-11-13 Thread darul
Looks like after timeout has finished, first solr instance respond



I was not waiting enough. Is it possible to reduce this *timeout* value ?

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Testing-Solr-Cloud-with-ZooKeeper-tp4018900p4020190.html
Sent from the Solr - User mailing list archive at Nabble.com.


Has anyone HunspellStemFilterFactory working?

2012-11-13 Thread Rob Koeling
If so, would you be willing to share the .dic and .aff files with me?
When I try to load a dictionary file, Solr is complaining that:

java.lang.RuntimeException: java.io.IOException: Unable to load hunspell
data! [dictionary=en_GB.dic,affix=en_GB.aff]
at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:116)
...
Caused by: java.text.ParseException: The first non-comment line in the
affix file must be a 'SET charset', was: 'FLAG num'
at
org.apache.lucene.analysis.hunspell.HunspellDictionary.getDictionaryEncoding(HunspellDictionary.java:306)
at
org.apache.lucene.analysis.hunspell.HunspellDictionary.init(HunspellDictionary.java:130)
at
org.apache.lucene.analysis.hunspell.HunspellStemFilterFactory.inform(HunspellStemFilterFactory.java:103)
... 46 more

When I change the first line to 'SET charset' it is still not happy. I got
the dictionary files from the OpenOffice website.

I'm using Solr 4.0 (but had the same problem with 3.6)

  - Rob


Solr 4.0 Dismax woes (2 specifically)

2012-11-13 Thread dm_tim
Heck,

I originally started using the default query parser but gave up on it
because all of my search results are equally important and idf was messing
up my results pretty badly. So I discovered the DisMax query parser which
doesn't use idf. I was elated until I started testing. My initial results
looked good but when I cut down the query string from clothes to clot I
got zero results. 

I've been reading about how disMax is supposed to do fuzzy searches but I
can't make it work at all. 

To complicate matters I discovered that my all of my search words are being
used against all of the query fields. I had previously assumed that each
search word would only be applied to individual query fields. 

So for example my q is:
clothe 95

And my qf:
tag cid

So I believe that the words clothe and 95 are being searched on both
fields (tag and cid) which is not what I wanted to do. I was hoping to
have cloth applied only to the tag field and 95 applied only to the
cid field.

I really don't have it in me to write my own query parser so I'm hoping to
find a way to do a fuzzy search without scores being screwed by idf. Is
there a way to achieve my desired results with existing code?

Regards,

(A tired) Tim



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-0-Dismax-woes-2-specifically-tp4020197.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr v4: Synonyms... better at index time or query time?

2012-11-13 Thread dm_tim
Good to know. Thanks.

T



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-v4-Synonyms-better-at-index-time-or-query-time-tp4020179p4020198.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr4.0 / SolrCloud queries

2012-11-13 Thread shreejay
Thanks Mark. I meant ConcurrentMergeScheduler and ramBufferSizeMB (not
maxBuffer). These are my settings for Merge. 

/
ramBufferSizeMB960/ramBufferSizeMB

mergeFactor40/mergeFactor
mergeScheduler
class=org.apache.lucene.index.ConcurrentMergeScheduler/
/



--Shreejay


Mark Miller-3 wrote
 On Nov 9, 2012, at 1:20 PM, shreejay lt;

 shreejayn@

 gt; wrote:
 
 Instead of doing an optimize, I have now changed the Merge settings by
 keeping a maxBuffer = 960, a merge Factor = 40 and ConcurrentMergePolicy. 
 
 Don't you mean ConcurrentMergeScheduler?
 
 Keep in mind that if you use the default TieredMergePolicy, mergeFactor
 will have no affect. You need to use  maxMergeAtOnce and segmentsPerTier
 as sub args to the merge policy config (see the commented out example in
 solrconfig.xml). 
 
 Also, it's probably best to avoid using maxBufferedDocs at all.
 
 - Mark


Mark Miller-3 wrote
 On Nov 9, 2012, at 1:20 PM, shreejay lt;

 shreejayn@

 gt; wrote:
 
 Instead of doing an optimize, I have now changed the Merge settings by
 keeping a maxBuffer = 960, a merge Factor = 40 and ConcurrentMergePolicy. 
 
 Don't you mean ConcurrentMergeScheduler?
 
 Keep in mind that if you use the default TieredMergePolicy, mergeFactor
 will have no affect. You need to use  maxMergeAtOnce and segmentsPerTier
 as sub args to the merge policy config (see the commented out example in
 solrconfig.xml). 
 
 Also, it's probably best to avoid using maxBufferedDocs at all.
 
 - Mark





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr4-0-SolrCloud-queries-tp4016825p4020200.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Nested Join Queries

2012-11-13 Thread Gerald Blanck
Thank you Mikhail.  Unfortunately BlockJoinQuery is not an option we can
leverage.

- We have modeled our document types as different indexes/cores.
- Our relationships which we are attempting to join across are not
single-parent to many-children relationships.  They are in fact many to
many.
- Additionally, memory usage is a concern.

FYI.  After making the code change I mentioned in my original post, we have
completed a full test cycle and did not experience any adverse impacts to
the change.  And our join query functionality returns the results we
wanted.  I would still be interested in hearing an explanation as to why
the code is written as it is in v4.0.0.

Thanks.




On Tue, Nov 13, 2012 at 8:31 AM, Mikhail Khludnev 
mkhlud...@griddynamics.com wrote:

 Please find reference materials


 http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html
 http://blog.griddynamics.com/2012/08/block-join-query-performs.html




 On Tue, Nov 13, 2012 at 3:25 PM, Gerald Blanck 
 gerald.bla...@barometerit.com wrote:

 Thank you.  I've not heard of BlockJoin.  I will look into it today.
  Thanks.


 On Tue, Nov 13, 2012 at 5:05 AM, Mikhail Khludnev 
 mkhlud...@griddynamics.com wrote:

 Replied. pls check maillist.



 On Tue, Nov 13, 2012 at 11:44 AM, Mikhail Khludnev 
 mkhlud...@griddynamics.com wrote:

 Gerald,

 I wonder if you tried to approach BlockJoin for your problem? Can you
 afford less frequent updates?


 On Wed, Nov 7, 2012 at 5:40 PM, Gerald Blanck 
 gerald.bla...@barometerit.com wrote:

 Thank you Erick for your reply.  I understand that search is not an
 RDBMS.
  Yes, we do have a huge combinatorial explosion if we de-normalize and
 duplicate data.  In fact, I believe our use case is exactly what the
 Solr
 developers were trying to solve with the addition of the Join query.
  And
 while the example I gave illustrates the problem we are solving with
 the
 Join functionality, it is simplistic in nature compared to what we
 have in
 actuality.

 Am still looking for an answer here if someone can shed some light.
  Thanks.


 On Sat, Nov 3, 2012 at 9:38 PM, Erick Erickson 
 erickerick...@gmail.comwrote:

  I'm going to go a bit sideways on you, partly because I can't answer
 the
  question G...
 
  But, every time I see someone doing what looks like substituting
 core for
  table and
  then trying to use Solr like a DB, I get on my soap-box and
 preach..
 
  In this case, consider de-normalizing your DB so you can ask the
 query in
  terms
  of search rather than joins. e.g.
 
  Make each document a combination of the author and the book, with an
  additional
  field author_has_written_a_bestseller. Now your query becomes a
 really
  simple
  search, author:name AND author_has_written_a_bestseller:true.
 True, this
  kind
  of approach isn't as flexible as an RDBMS, but it's a _search_
 rather than
  a query.
  Yes, it replicates data, but unless you have a huge combinatorial
  explosion, that's
  not a problem.
 
  And the join functionality isn't called pseudo for nothing. It was
  written for a specific
  use-case. It is often expensive, especially when the field being
 joined has
  many unique
  values.
 
  FWIW,
  Erick
 
 
  On Fri, Nov 2, 2012 at 11:32 AM, Gerald Blanck 
  gerald.bla...@barometerit.com wrote:
 
   At a high level, I have a need to be able to execute a query that
 joins
   across cores, and that query during its joining may join back to
 the
   originating core.
  
   Example:
   Find all Books written by an Author who has written a best selling
 Book.
  
   In Solr query syntax
   A) against the book core - bestseller:true
   B) against the author core - {!join fromIndex=book from=id
   to=bookid}bestseller:true
   C) against the book core - {!join fromIndex=author from=id
   to=authorid}{!join fromIndex=book from=id to=bookid}bestseller:true
  
   A - returns results
   B - returns results
   C - does not return results
  
   Given that A and C use the same core, I started looking for join
 code
  that
   compares the originating core to the fromIndex and found this
   in JoinQParserPlugin (line #159).
  
   if (info.getReq().getCore() == fromCore) {
  
 // if this is the same core, use the searcher passed
 in...
   otherwise we could be warming and
  
 // get an older searcher from the core.
  
 fromSearcher = searcher;
  
   } else {
  
 // This could block if there is a static warming query
 with a
   join in it, and if useColdSearcher is true.
  
 // Deadlock could result if two cores both had
 useColdSearcher
   and had joins that used eachother.
  
 // This would be very predictable though (should happen
 every
   time if misconfigured)
  
 fromRef = fromCore.getSearcher(false, true, null);
  
  
 // be careful not to do anything with this searcher that
  requires
   the thread local
  
 // SolrRequestInfo in a manner that 

Re: Solr 4.0 - distributed updates without zookeeper?

2012-11-13 Thread Peter Wolanin
Yes, basically I want to at least avoid leader election and the other
dynamic behaviors.  I don't have any experience with ZK, and a lot of
magic behavior seems baked in now that's I'm concerned I'd need to
dig into SK to debug or monitor what's really happening as we scale
out.

We also have a somewhat non-typical use case, of lots of small
cores/indexes on the same server, rather large indexes that might need
multiple shards.

We have master servers that have persistent (but sometimes slower)
storage, and slaves with faster non-persistent disk.

My colleague noticed that their is a param to flag a server as
eligible to be a shard leader, so I guess we could enable that for
only the preferred master?

I'm also having trouble understanding config handling from the docs.
Even browsing the java code I don't see if Solr is creating the
instance dirs, or somehow just linking to config files?  It sounds as
though if I create a core using core admin, it would get associated
with a collection of the same name.

-Peter

On Mon, Nov 12, 2012 at 9:41 PM, Otis Gospodnetic
otis.gospodne...@gmail.com wrote:
 Hi Peter,

 Not sure I have the answer for you, but are you looking to avoid using ZK
 for some reason?
 Or are you OK with ZK per se, but just don't want any leader re-election
 and any other dynamic/cloudy behaviour?

 Could you not simply treat 1 node as the master to which you send all
 your updates and let SolrCloud distribute that to the rest of the cluster?
 Is your main/only worry around what happens if this 1 node that you
 designated as the master goes down? What would you like to happen?  You'd
 like indexing to start failing, while the search functionality remains up?

 Otis
 --
 Search Analytics - http://sematext.com/search-analytics/index.html
 Performance Monitoring - http://sematext.com/spm/index.html


 On Sun, Nov 11, 2012 at 7:42 PM, Peter Wolanin 
 peter.wola...@acquia.comwrote:

 Looking at how we could upgrade some of our infrastructure to Solr 4.0
 - I would really like to take advantage of distributed updates to get
 NRT, but we want to keep our fixed master and slave server roles since
 we use different hardware appropriate to the different roles.

 Looking at the solr 4.0 distributed update code, it seems really
 hard-coded and bound to zookeeper.  Is there a way to have a solr
 master distribute updates without using ZK, or a way to mock the ZK
 interface to provide a fixed cluster topography that will work when
 sending updates just to the master?

 To be clear, if the master goes doen I don't want a slave promoted,
 nor do I want most of the other SolrCloud features - we have already
 built out a system for managing groups of servers.

 Thanks,

 Peter




-- 
Peter M. Wolanin, Ph.D.  : Momentum Specialist,  Acquia. Inc.
peter.wola...@acquia.com : 781-313-8322

Get a free, hosted Drupal 7 site: http://www.drupalgardens.com;


Re: Solr GC issues - Too many BooleanQuery BooleanClause objects in heap

2012-11-13 Thread Otis Gospodnetic
Hi,

Yeah, large heap can be problematic like that. :)
But if there is some sort of a leak, and if I had to bet I'd put my money
on your custom QP knowing what I know about this situation, you could also
start Solr with a much smaller heap and grab the heap snapshot as soon as
you see some number of those objects appearing towards the top of jmap -
that should be enough to trace them to their roots.

Otis
--
Solr Performance Monitoring - http://sematext.com/spm/index.html


On Tue, Nov 13, 2012 at 5:18 PM, Prasanna R plistma...@gmail.com wrote:

 We do have a custom query parser that is responsible for expanding the user
 input query into a bunch of prefix, phrase and regular boolean queries in a
 manner similar to that done by DisMax.

 Analyzing heap with jhat/YourKit is on my list of things to do but I
 haven't gotten around to doing it yet. Our big heap size (13G) makes it a
 little difficult to do a full blown heap dump analysis.

 Thanks a ton for the reply Otis!

 Prasanna

 On Mon, Nov 12, 2012 at 5:42 PM, Otis Gospodnetic 
 otis.gospodne...@gmail.com wrote:

  Hi,
 
  I've never seen this.  You don't have a custom query parser or anything
  else custom, do you?
  Have you tried dumping and analyzing heap?  YourKit has a 7 day eval, or
  you can use things like jhat, which may be included on your machine
 already
  (see
 http://docs.oracle.com/javase/6/docs/technotes/tools/share/jhat.html).
 
  Otis
  --
  Performance Monitoring - http://sematext.com/spm/index.html
 
 
  On Mon, Nov 12, 2012 at 8:35 PM, Prasanna R plistma...@gmail.com
 wrote:
 
We have been using Solr in a custom setup where we generate results
 for
   user queries by expanding it to a large boolean query consisting of
   multiple prefix queries. There have been some GC issues recently with
 the
   Old/tenured generation becoming nearly 100% full leading to near
 constant
   full GC cycles.
  
   We are running Solr 3.1 on servers with 13G of heap. jmap live object
   histogram is as follows:
  
   num #instances #bytes  class name
   --
  1:  27441222 1550723760  [Ljava.lang.Object;
  2:  23546318  879258496  [C
  3:  23813405  762028960  java.lang.String
  4:  22700095  726403040
  org.apache.lucene.search.BooleanQuery
  5:  27431515  658356360  java.util.ArrayList
  6:  22911883  549885192
   org.apache.lucene.search.BooleanClause
  7:  21651039  519624936  org.apache.lucene.index.Term
  8:   6876651  495118872
   org.apache.lucene.index.FieldsReader$LazyField
  9:  11354214  363334848
  org.apache.lucene.search.PrefixQuery
 10:   4281624  137011968  java.util.HashMap$Entry
 11:   3466680   83200320  org.apache.lucene.search.TermQuery
 12:   1987450   79498000
  org.apache.lucene.search.PhraseQuery
 13:631994   70148624  [Ljava.util.HashMap$Entry;
   .
  
   I have looked at the Solr cache settings multiple times but am not able
  to
   figure out how/why the high number of BooleanQuery and BooleanClause
  object
   instances stay alive. These objects are live and do not get collected
  even
   when the traffic is disabled and a manual GC is triggered which
 indicates
   that someone is holding onto references.
  
   Can anyone provide more details on the circumstances under which these
   objects stay alive and/or cached? If they are cached then is the
 caching
   configurable?
  
   Any and all tips/suggestions/pointers will be much appreciated.
  
   Thanks,
  
   Prasanna
  
 



Re: Run multiple instances of solr using single data directory

2012-11-13 Thread Otis Gospodnetic
Hi,

If you have high query rate, running multiple instances of Solr on the same
server doesn't typically make sense.  I'd stop and rethink :)

Otis
--
Solr Performance Monitoring - http://sematext.com/spm/index.html


On Tue, Nov 13, 2012 at 5:46 PM, Rohit Harchandani rhar...@gmail.comwrote:

 Hi All,
 I am currently using solr 4.0. The application I am working on requires a
 high rate of queries per second.
 Currently, we have setup a single master and a single slave on a production
 machine. We want to bring up multiple instances of solr (slaves). Are there
 any problems, when bringing them up on different ports but using the same
 data directory? These will be only serving up queries and all the indexing
 will take place on the master machine.

 Also, if i have multiple instances from the same data directory and i
 perform replication. Would that re-open searchers on all the instances?
 Thanks,
 Rohit



Re: Searchers, threads and performance

2012-11-13 Thread Otis Gospodnetic
Hello Andy,

On Tue, Nov 13, 2012 at 1:26 PM, Andy Lester a...@petdance.com wrote:

 We're getting close to deploying our Solr search solution, and we're doing
 performance testing, and we've run into some questions and concerns.

 Our number one problem: Doing a commit from loading records, which can
 happen throughout the day, makes all queries stop for 5-7 seconds.  This is
 a showstopper for deployment.

 Here's what we've observed: Upon commit, Solr finishes processing queries
 in flight, starts up a new searcher, warms it, shuts down the old searcher
 and puts the new searcher into effect. Does the old searcher stop taking
 requests before the new searcher is warmed or after? How wide is the window
 of time wherein Solr is not serving requests?  For us, it's about five
 seconds and we need to drop that dramatically.  In general, what is the
 difference between accepting the delay of waiting for warming vs. accepting
 the delay of running useColdSearcher=true?


Old searcher is used while the new one is being warmed up.
Solr should always be serving requests - it's not designed to have a a
moment when it's not serving them because of searcher swap.
Don't use cold searcher - the first unlucky user will pay the price and
will likely block all other queries for a while.
Your queries probably don't actually stop for 5-7 seconds, they just slow
down.
Look at your system's performance metrics during this period.  GC high?
Disk IO high? CPU high?


 Is there any such thing as/any sense in running more than one searcher in
 our scenario?  What are the benefits of multiple searchers?  Erik Erikson
 posts in 2012: Unless you have warming happening, there should only be a
 single searcher open at any given time. Except: If your queries run
 across several commits you'll get multiple searchers open. Not sure if
 this is a general observation, or specific to the particular poster's
 situation.

 Finally, what do people mean when they blog that they have Solr set up for
 n threads? Is that the same thing as saying that Solr can be processing n
 requests simultaneously?


Not sure what they mean :)  It is the servlet container that deals with
threads, not Solr.  Maybe this is referring to indexing with N threads in
parallel to speed up  indexing (in pre-Solr 4.0 days).



 Thanks for any insight or even links to relevant pages.  We've been
 Googling all over and haven't found answers to the above.


Try http://search-lucene.com

If you are doing performance testing and warming is a concern, one of the
reports in SPM for Solr shows where warming time is being spent - on which
caches or on the searcher, and how much time goes on each.  Oh, which
reminds me - it is also possible your cache settings are such that they
require a lot of warming and it is possible that your warmup queries are
too heavy or numerous.

Otis
-- 
Solr Performance Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html


Using CJK analyzer

2012-11-13 Thread johnmunir
Hi,


Using Solr 1.2.0, the following works (and I get hits searching on Chinese 
text):


fieldType name=text class=solr.TextField
analyzer type=index class=org.apache.lucene.analysis.cjk.CJKAnalyzer” /
analyzer type=query class=org.apache.lucene.analysis.cjk.CJKAnalyzer” /
/fieldType



But it won't work using Solr 3.6.1.  Any idea what I might be missing?


Yes, I also tried (in Solr 3.6.1):



!-- CJK bigram (see text_ja for a Japanese configuration using 
morphological analysis) --
fieldType name=text class=solr.TextField positionIncrementGap=100
  analyzer
tokenizer class=solr.StandardTokenizerFactory/
!-- normalize width before bigram, as e.g. half-width dakuten combine  
--
filter class=solr.CJKWidthFilterFactory/
!-- for any non-CJK --
filter class=solr.LowerCaseFilterFactory/
filter class=solr.CJKBigramFilterFactory/
  /analyzer
/fieldType



and it won't work.


I run it through the analyzer and I see this (I hope the table will show up 
fine on the mailing list):


Index Analyzer
org.apache.lucene.analysis.cn.ChineseAnalyzer {}


position
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

term text
去
除
商
品
操
作
在
订
购
单
中
留
下
空
白
行

startOffset
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

endOffset
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16


Query Analyzer
org.apache.lucene.analysis.cn.ChineseAnalyzer {}


position
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

term text
去
除
商
品
操
作
在
订
购
单
中
留
下
空
白
行

startOffset
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

endOffset
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16






--MJ


Re: Nested Join Queries

2012-11-13 Thread Mikhail Khludnev
Gerald,
Nice to hear the the your problem is solved. Can you contribute a test case
to reproduce this issue?

FWIW, my team successfully deals with Many-to-Many in BlockJoin. It works,
but solution is a little bit immature yet.


On Wed, Nov 14, 2012 at 5:59 AM, Gerald Blanck 
gerald.bla...@barometerit.com wrote:

 Thank you Mikhail.  Unfortunately BlockJoinQuery is not an option we can
 leverage.

 - We have modeled our document types as different indexes/cores.
 - Our relationships which we are attempting to join across are not
 single-parent to many-children relationships.  They are in fact many to
 many.
 - Additionally, memory usage is a concern.

 FYI.  After making the code change I mentioned in my original post, we
 have completed a full test cycle and did not experience any adverse impacts
 to the change.  And our join query functionality returns the results we
 wanted.  I would still be interested in hearing an explanation as to why
 the code is written as it is in v4.0.0.

 Thanks.




 On Tue, Nov 13, 2012 at 8:31 AM, Mikhail Khludnev 
 mkhlud...@griddynamics.com wrote:

 Please find reference materials


 http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html
 http://blog.griddynamics.com/2012/08/block-join-query-performs.html




 On Tue, Nov 13, 2012 at 3:25 PM, Gerald Blanck 
 gerald.bla...@barometerit.com wrote:

 Thank you.  I've not heard of BlockJoin.  I will look into it today.
  Thanks.


 On Tue, Nov 13, 2012 at 5:05 AM, Mikhail Khludnev 
 mkhlud...@griddynamics.com wrote:

 Replied. pls check maillist.



 On Tue, Nov 13, 2012 at 11:44 AM, Mikhail Khludnev 
 mkhlud...@griddynamics.com wrote:

 Gerald,

 I wonder if you tried to approach BlockJoin for your problem? Can you
 afford less frequent updates?


 On Wed, Nov 7, 2012 at 5:40 PM, Gerald Blanck 
 gerald.bla...@barometerit.com wrote:

 Thank you Erick for your reply.  I understand that search is not an
 RDBMS.
  Yes, we do have a huge combinatorial explosion if we de-normalize and
 duplicate data.  In fact, I believe our use case is exactly what the
 Solr
 developers were trying to solve with the addition of the Join query.
  And
 while the example I gave illustrates the problem we are solving with
 the
 Join functionality, it is simplistic in nature compared to what we
 have in
 actuality.

 Am still looking for an answer here if someone can shed some light.
  Thanks.


 On Sat, Nov 3, 2012 at 9:38 PM, Erick Erickson 
 erickerick...@gmail.comwrote:

  I'm going to go a bit sideways on you, partly because I can't
 answer the
  question G...
 
  But, every time I see someone doing what looks like substituting
 core for
  table and
  then trying to use Solr like a DB, I get on my soap-box and
 preach..
 
  In this case, consider de-normalizing your DB so you can ask the
 query in
  terms
  of search rather than joins. e.g.
 
  Make each document a combination of the author and the book, with an
  additional
  field author_has_written_a_bestseller. Now your query becomes a
 really
  simple
  search, author:name AND author_has_written_a_bestseller:true.
 True, this
  kind
  of approach isn't as flexible as an RDBMS, but it's a _search_
 rather than
  a query.
  Yes, it replicates data, but unless you have a huge combinatorial
  explosion, that's
  not a problem.
 
  And the join functionality isn't called pseudo for nothing. It was
  written for a specific
  use-case. It is often expensive, especially when the field being
 joined has
  many unique
  values.
 
  FWIW,
  Erick
 
 
  On Fri, Nov 2, 2012 at 11:32 AM, Gerald Blanck 
  gerald.bla...@barometerit.com wrote:
 
   At a high level, I have a need to be able to execute a query that
 joins
   across cores, and that query during its joining may join back to
 the
   originating core.
  
   Example:
   Find all Books written by an Author who has written a best
 selling Book.
  
   In Solr query syntax
   A) against the book core - bestseller:true
   B) against the author core - {!join fromIndex=book from=id
   to=bookid}bestseller:true
   C) against the book core - {!join fromIndex=author from=id
   to=authorid}{!join fromIndex=book from=id
 to=bookid}bestseller:true
  
   A - returns results
   B - returns results
   C - does not return results
  
   Given that A and C use the same core, I started looking for join
 code
  that
   compares the originating core to the fromIndex and found this
   in JoinQParserPlugin (line #159).
  
   if (info.getReq().getCore() == fromCore) {
  
 // if this is the same core, use the searcher passed
 in...
   otherwise we could be warming and
  
 // get an older searcher from the core.
  
 fromSearcher = searcher;
  
   } else {
  
 // This could block if there is a static warming query
 with a
   join in it, and if useColdSearcher is true.
  
 // Deadlock could result if two cores both had
 useColdSearcher
   and had joins that used eachother.