Re: AutoSoftcommit option solr 4.0

2012-11-26 Thread Vadim Kisselmann
Hi Shaveta,
simple, index a doc and search for this ;)
An soft commit stands for NearRealTimeSearch, It could take a couple
of seconds to see this doc,
but it should be there.
Best regards
Vadim


2012/11/26 Shaveta_Chawla shaveta.cha...@knimbus.com:
 I have migrated solr 3.6 to solr 4.0. I have implemented solr4.0's auto
 commit option by adding
 autoSoftCommit
  maxTime1000/maxTime
/autoSoftCommit
 autoCommit
maxTime6/maxTime
openSearcherfalse/openSearcher
  /autoCommit
  these lines in solrconfig.xml.

 I am doing these changes on my local machine. I know what autosoftcommit
 features does but how can i check that the autocommit feature is working ok?



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/AutoSoftcommit-option-solr-4-0-tp4022302.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: AutoSoftcommit option solr 4.0

2012-11-26 Thread Vadim Kisselmann
Hi Shaveta,
simple, index a doc and search for this ;)
An soft commit stands for NearRealTimeSearch, It could take a couple
of seconds to see this doc,
but it should be there.
Best regards
Vadim


2012/11/26 Shaveta_Chawla shaveta.cha...@knimbus.com:
 I have migrated solr 3.6 to solr 4.0. I have implemented solr4.0's auto
 commit option by adding
 autoSoftCommit
  maxTime1000/maxTime
/autoSoftCommit
 autoCommit
maxTime6/maxTime
openSearcherfalse/openSearcher
  /autoCommit
  these lines in solrconfig.xml.

 I am doing these changes on my local machine. I know what autosoftcommit
 features does but how can i check that the autocommit feature is working ok?



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/AutoSoftcommit-option-solr-4-0-tp4022302.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Out Of Memory =( Too many cores on one server?

2012-11-16 Thread Vadim Kisselmann
Hi,
your JVM need more RAM. My setup works well with 10 Cores, and 300mio.
docs, Xmx8GB Xms8GB, 16GB for OS.
But it's how Bernd mentioned, the memory consumption depends on the
number of fields and the fieldCache.
Best Regards
Vadim



2012/11/16 Bernd Fehling bernd.fehl...@uni-bielefeld.de:
 I guess you should give JVM more memory.

 When starting to find a good value for -Xmx I oversized and  set
 it to Xmx20G and Xms20G. Then I monitored the system and saw that JVM is
 between 5G and 10G (java7 with G1 GC).
 Now it is finally set to Xmx11G and Xms11G for my system with 1 core and 38 
 million docs.
 But JVM memory depends pretty much on number of fields in schema.xml
 and fieldCache (sortable fields).

 Regards
 Bernd

 Am 16.11.2012 09:29, schrieb stockii:
 Hello.

 if my server is running for a while i get some OOM Problems. I think the
 problem is, that i running to many cores on one Server with too many
 documents.

 this is my server concept:
 14 cores.
 1 with 30 million docs
 1 with 22 million docs
 1 with growing 25 million docs
 1 with 67 million docs
 and the other cores are under 1 million docs.

 all these cores are running fine in one jetty and searching is very fast and
 we are satisfied with this.
 yesterday we got OOM.

 Do you think that we should outsource the big cores into another virtual
 instance of the server? so that the JVM not share the memory and going OOM?
 starting with: MEMORY_OPTIONS=-Xmx6g -Xms2G -Xmn1G



Re: Re: how solr4.0 and zookeeper run on weblogic

2012-10-18 Thread Vadim Kisselmann
Hi,
how your update/add command looks like?
Regards
Vadim


2012/10/18 rayvicky zongwei...@gmail.com:
 i make it work on weblogic.
 but when i add or update index  ,it error


 2012-10-17 ?Χ03?47·?3? CST Error HTTP Session BEA-100060 An 
 unexpected error occurred while retrieving the session for Web application: 
 weblogic.servlet.internal.WebAppServletContext@425eab87 - appName: 'solr', 
 name: 'solr', context-path: '/solr', spec-version: '2.5'.
 weblogic.utils.NestedRuntimeException: Cannot parse POST parameters of 
 request: '/solr/collection1/update'
 at 
 weblogic.servlet.internal.ServletRequestImpl$RequestParameters.mergePostParams(ServletRequestImpl.java:2021)
 at 
 weblogic.servlet.internal.ServletRequestImpl$RequestParameters.parseQueryParams(ServletRequestImpl.java:1901)
 at 
 weblogic.servlet.internal.ServletRequestImpl$RequestParameters.peekParameter(ServletRequestImpl.java:2047)
 at 
 weblogic.servlet.internal.ServletRequestImpl$SessionHelper.initSessionInfoWithContext(ServletRequestImpl.java:2602)
 at 
 weblogic.servlet.internal.ServletRequestImpl$SessionHelper.initSessionInfo(ServletRequestImpl.java:2506)
 Truncated. see log file for complete stacktrace
 java.net.SocketTimeoutException: Read timed out
 at java.net.SocketInputStream.socketRead0(Native Method)
 at java.net.SocketInputStream.read(SocketInputStream.java:129)
 at 
 weblogic.servlet.internal.PostInputStream.read(PostInputStream.java:142)
 at 
 weblogic.utils.http.HttpChunkInputStream.readChunkSize(HttpChunkInputStream.java:109)
 at 
 weblogic.utils.http.HttpChunkInputStream.initChunk(HttpChunkInputStream.java:71)
 Truncated. see log file for complete stacktrace

 2012-10-17 ?Χ03?47·?3? CST Error HTTP BEA-101020 
 [weblogic.servlet.internal.WebAppServletContext@425eab87 - appName: 'solr', 
 name: 'solr', context-path: '/solr', spec-version: '2.5'] Servlet failed with 
 Exception
 java.lang.IllegalStateException: Failed to retrieve session: Cannot parse 
 POST parameters of request: '/solr/collection1/update'
 at 
 weblogic.servlet.security.internal.SecurityModule.getUserSession(SecurityModule.java:486)
 at 
 weblogic.servlet.security.internal.ServletSecurityManager.checkAccess(ServletSecurityManager.java:81)
 at 
 weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2116)
 at 
 weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2086)
 at 
 weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1406)
 Truncated. see log file for complete stacktrace

 2012-10-17 ?Χ03?47·?3? CST Error Kernel BEA-000802 ExecuteRequest 
 failed
  weblogic.utils.NestedRuntimeException: Cannot parse POST parameters of 
 request: '/solr/collection1/update'.
 weblogic.utils.NestedRuntimeException: Cannot parse POST parameters of 
 request: '/solr/collection1/update'
 at 
 weblogic.servlet.internal.ServletRequestImpl$RequestParameters.mergePostParams(ServletRequestImpl.java:2021)
 at 
 weblogic.servlet.internal.ServletRequestImpl$RequestParameters.parseQueryParams(ServletRequestImpl.java:1901)
 at 
 weblogic.servlet.internal.ServletRequestImpl$RequestParameters.peekParameter(ServletRequestImpl.java:2047)
 at 
 weblogic.servlet.internal.ServletRequestImpl$SessionHelper.initSessionInfoWithContext(ServletRequestImpl.java:2602)
 at 
 weblogic.servlet.internal.ServletRequestImpl$SessionHelper.initSessionInfo(ServletRequestImpl.java:2506)
 Truncated. see log file for complete stacktrace
 java.io.IOException: Malformed chunk
 at 
 weblogic.utils.http.HttpChunkInputStream.initChunk(HttpChunkInputStream.java:67)
 at 
 weblogic.utils.http.HttpChunkInputStream.read(HttpChunkInputStream.java:142)
 at 
 weblogic.utils.http.HttpChunkInputStream.read(HttpChunkInputStream.java:182)
 at 
 weblogic.servlet.internal.ServletInputStreamImpl.read(ServletInputStreamImpl.java:222)
 at 
 weblogic.servlet.internal.ServletRequestImpl$RequestParameters.mergePostParams(ServletRequestImpl.java:1995)
 Truncated. see log file for complete stacktrace


 how to handle it ?

 thanks,
 ray.


 2012-10-18



 zongweilei



 发件人: Jan_Høydahl_/_Cominvent_[via_Lucene]
 发送时间: 2012-10-17  23:13:10
 收件人: rayvicky
 抄送:
 主题: Re: how solr4.0 and zookeeper run on weblogic

 Did it work for you? You probably also have to set -Djetty.port=8080 in order 
 for local ZK not to be started on port 9983. It's confusing, but you can also 
 edit solr.xml to achieve the same.

 --
 Jan H酶ydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr Training - www.solrtraining.com

 17. okt. 2012 kl. 10:06 skrev rayvicky [hidden email]:

 thanks



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/how-solr4-0-and-zookeeper-run-on-weblogic-tp4013882p4014167.html
 

Re: how solr4.0 and zookeeper run on weblogic

2012-10-16 Thread Vadim Kisselmann
Hi,
these are JAVA_OPTS params, you can find and set this stuff in the
startManagedWeblogic script.
Best regards
Vadim



2012/10/16 rayvicky zongwei...@gmail.com:
 who can help me ?
 where to settings   -DzkRun-Dbootstrap_conf=true
 -DzkHost=localhost:9080   -DnumShards=2
 in weblogic



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/how-solr4-0-and-zookeeper-run-on-weblogic-tp4013882.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multicore setup is ignored when deploying solr.war on Tomcat 5/6/7

2012-10-15 Thread Vadim Kisselmann
Hi Rogerio,
i can imagine what it is. Tomcat extract the war-files in
/var/lib/tomcatXX/webapps.
If you already run an older Solr-Version on your server, the old
extracted Solr-war could still be there (keyword: tomcat cache).
Delete the /var/lib/tomcatXX/webapps/solr - folder and restart tomcat,
when Tomcat should put your new war-file.
Best regards
Vadim



2012/10/14 Rogerio Pereira rogerio.ara...@gmail.com:
 I'll try to be more specific Jack.

 I just download the apache-solr-4.0.0.zip, from this archive I took the
 core1 and core2 folders from multicore example and rename them to
 collection1 and collection2, I also did all necessary changes on solr.xml
 and solrconfig.xml and schema.xml on these two correct to reflect the new
 names.

 After this step I just tried to deploy and war file on tomcat pointing to
 the the directory (solr/home) where these two cores are located, solr.xml
 is there, with collection1 and collection2 properly configured.

 The question is, now matter what is contained on solr.xml, this file isn't
 read at Tomcat startup, I tried to cause a parser error on solr.xml by
 removing closing tags, but even with this change I can't get at least a
 parser error.

 I hope to be clear now.


 2012/10/14 Jack Krupansky j...@basetechnology.com

 I can't quite parse the same multicore deployment as we have on apache
 solr 4.0 distribution archive. Could you rephrase and be more specific.
 What archive?

 Were you already using 4.0-ALPHA or BETA (or some snapshot of 4.0) or are
 you moving from pre-4.0 to 4.0? The directory structure did change in 4.0.
 Look at the example/solr directory.

 -- Jack Krupansky

 -Original Message- From: Rogerio Pereira
 Sent: Sunday, October 14, 2012 10:01 AM
 To: solr-user@lucene.apache.org
 Subject: Multicore setup is ignored when deploying solr.war on Tomcat 5/6/7


 Hi,

 I tried to perform the same multicore deployment as we have on apache solr
 4.0 distribution archive, I created a directory for solr/home with solr.xml
 inside and two subdirectories collection1 and collection2, these two cores
 are properly configured with conf folder and solrconfi.xml and schema.xml,
 on Tomcat I setup the system property pointing to solr/home path,
 unfortunatelly when I start tomcat the solr.xml is ignored and only the
 default collection1 is loaded.

 As a test, I made changes on solr.xml to cause parser errors, and guess
 what? These errors aren't reported on tomcat startup.

 The same thing doesn't happens on multicore example that comes on
 distribution archive, now I'm trying to figure out what's the black magic
 happening.

 Let me do the same kind of deployment on Windows and Mac OSX, if persist,
 I'll update this thread.

 Regards,

 Rogério




 --
 Regards,

 Rogério Pereira Araújo

 Blogs: http://faces.eti.br, http://ararog.blogspot.com
 Twitter: http://twitter.com/ararog
 Skype: rogerio.araujo
 MSN: ara...@hotmail.com
 Gtalk/FaceTime: rogerio.ara...@gmail.com

 (0xx62) 8240 7212
 (0xx62) 3920 2666


Re: Proximity(tilde) combined with wildcard, AutomatonQuery ?

2012-10-05 Thread Vadim Kisselmann
Hi Ahmet,
thank you, it sounds great:)
I will test it in the next days and give feedback.
Best regards
Vadim



2012/10/5 Ahmet Arslan iori...@yahoo.com:
 Hi Vadim,

 I attached a zip (solr plugin) file to SOLR-1604. This not a patch. This is 
 supposed to work with solr 4.0. Some tests fails but it should  work with 
 pol* tel*~5 types of queries.

 Ahmet

 --- On Thu, 9/27/12, Vadim Kisselmann v.kisselm...@gmail.com wrote:

 From: Vadim Kisselmann v.kisselm...@gmail.com
 Subject: Re: Proximity(tilde) combined with wildcard, AutomatonQuery ?
 To: solr-user@lucene.apache.org
 Date: Thursday, September 27, 2012, 10:38 AM
 Hi Ahmet,
 thanks for your reply:)
 I see that it does not come with the 4.0 release, because
 the given
 patches do not work with this version.
 Right?
 Best regards
 Vadim


 2012/9/26 Ahmet Arslan iori...@yahoo.com:
 
  we assume i have a simple query like this with
 wildcard and
  tilde:
 
  japa* fukushima~10
 
  instead of japan fukushima~10 OR japanese
 fukushima~10,
  etc.
 
  Do we have a solution in Solr 4.0 to work with
 these kind of
  queries?
 
  Vadim, two open jira issues:
 
  https://issues.apache.org/jira/browse/SOLR-1604
  https://issues.apache.org/jira/browse/LUCENE-1486
 



Re: Proximity(tilde) combined with wildcard, AutomatonQuery ?

2012-09-27 Thread Vadim Kisselmann
Hi Ahmet,
thanks for your reply:)
I see that it does not come with the 4.0 release, because the given
patches do not work with this version.
Right?
Best regards
Vadim


2012/9/26 Ahmet Arslan iori...@yahoo.com:

 we assume i have a simple query like this with wildcard and
 tilde:

 japa* fukushima~10

 instead of japan fukushima~10 OR japanese fukushima~10,
 etc.

 Do we have a solution in Solr 4.0 to work with these kind of
 queries?

 Vadim, two open jira issues:

 https://issues.apache.org/jira/browse/SOLR-1604
 https://issues.apache.org/jira/browse/LUCENE-1486



Re: How to run Solr Cloud using Tomcat?

2012-09-27 Thread Vadim Kisselmann
Hi Roy,
jepp, it works with Tomcat 6 and an external Zookeeper.
I will publish a blogpost about it tomorrow on sentric.ch
My blogpost is ready, but i had no time to publish it in the last
couple of days:)
Best regards
Vadim



2012/9/27 Markus Jelsma markus.jel...@openindex.io:
 Hi - on Debian systems there's a /etc/default/tomcat properties file you can 
 use to set your flags.

 -Original message-
 From:Benjamin, Roy rbenja...@ebay.com
 Sent: Thu 27-Sep-2012 19:57
 To: solr-user@lucene.apache.org
 Subject: How to run Solr Cloud using Tomcat?

 I've gone through the guide on running Solr Cloud using Jetty but it's not
 practical to use JAVA_OPTS etc on real cloud deployments. I don't see how
 to extend these instructions to running on Tomcat.

 Has anyone run Solr Cloud under Tomcat successfully?  Did they document how?

 Thanks

 Roy



Proximity(tilde) combined with wildcard, AutomatonQuery ?

2012-09-26 Thread Vadim Kisselmann
Hi guys,

we assume i have a simple query like this with wildcard and tilde:

japa* fukushima~10

instead of japan fukushima~10 OR japanese fukushima~10, etc.

Do we have a solution in Solr 4.0 to work with these kind of queries?
Does the AutomatonQuery/Filter cover this case?

Best regards
Vadim


Re: Problem to start solr-4.0.0-BETA with tomcat-6.0.20

2012-08-28 Thread Vadim Kisselmann
Hi Claudio,
great to hear that it works.
Everyone can edit the wiki, you need only to login.
Regards
Vadim


2012/8/27 Claudio Ranieri claudio.rani...@estadao.com:
 I solved the problem.
 I added the parameter sharedLib=lib in $SOLR_HOME/solr.xml (solr 
 persistent=true sharedLib=lib) and moved all jars into 
 $TOMCAT_HOME/webapps/solr/WEB-INF/lib to $SOLR_HOME/lib
 This information could be included in the wiki Solr / Tomcat.

 Claudio Ranieri | Especialista Sistemas de Busca | S.A O Estado de S.Paulo
 Av. Eng. Caetano Álvares, 55 - Limão - São Paulo - SP - 02598-900
 + 55 11 3856-5790 | + 55 11 9344-2674





 -Mensagem original-
 De: Claudio Ranieri [mailto:claudio.rani...@estadao.com]
 Enviada em: segunda-feira, 27 de agosto de 2012 10:34
 Para: solr-user@lucene.apache.org
 Assunto: RES: Problem to start solr-4.0.0-BETA with tomcat-6.0.20

 Can anyone help me?


 -Mensagem original-
 De: Claudio Ranieri [mailto:claudio.rani...@estadao.com]
 Enviada em: sexta-feira, 24 de agosto de 2012 11:40
 Para: solr-user@lucene.apache.org
 Assunto: RES: Problem to start solr-4.0.0-BETA with tomcat-6.0.20

 Hi Vadim,
 No, I used the entire apache-solr-4.0.0-BETA\example\solr (schema.xml, 
 solrconfig.xml ...)


 -Mensagem original-
 De: Vadim Kisselmann [mailto:v.kisselm...@gmail.com] Enviada em: sexta-feira, 
 24 de agosto de 2012 07:26
 Para: solr-user@lucene.apache.org
 Assunto: Re: Problem to start solr-4.0.0-BETA with tomcat-6.0.20

 a presumption:
 do you use your old solrconfig.xml files from older installations?
 when yes, compare the default config and yours.


 2012/8/23 Claudio Ranieri claudio.rani...@estadao.com:
 I made this instalation on a new tomcat.
 With Solr 3.4.*, 3.5.*, 3.6.* works with jars into 
 $TOMCAT_HOME/webapps/solr/WEB-INF/lib, but with solr 4.0 beta doesn´t work. 
 I needed to add the jars into $TOMCAT_HOME/lib.
 The problem with the cast seems to be in the source code.


 -Mensagem original-
 De: Karthick Duraisamy Soundararaj
 [mailto:karthick.soundara...@gmail.com]
 Enviada em: quinta-feira, 23 de agosto de 2012 09:22
 Para: solr-user@lucene.apache.org
 Assunto: Re: Problem to start solr-4.0.0-BETA with tomcat-6.0.20

 Not sure if this can help. But once I had a similar problem with Solr 3.6.0 
 where tomcat refused to find one of the classes that existed. I deleted the 
 tomcat's webapp directory and then it worked fine.

 On Thu, Aug 23, 2012 at 8:19 AM, Erick Erickson 
 erickerick...@gmail.comwrote:

 First, I'm no Tomcat expert here's the Tomcat Solr page, but
 you've probably already seen it:
 http://wiki.apache.org/solr/SolrTomcat

 But I'm guessing that you may have old jars around somewhere and
 things are getting confused. I'd blow away the whole thing and start
 over, whenever I start copying jars around I always lose track of
 what's where.

 Have you successfully had any other Solr operate under Tomcat?

 Sorry I can't be more help
 Erick

 On Wed, Aug 22, 2012 at 9:47 AM, Claudio Ranieri
 claudio.rani...@estadao.com wrote:
  Hi,
 
  I tried to start the solr-4.0.0-BETA with tomcat-6.0.20 but does
  not
 work.
  I copied the apache-solr-4.0.0-BETA.war to $TOMCAT_HOME/webapps.
  Then I
 copied the directory apache-solr-4.0.0-BETA\example\solr to
 C:\home\solr-4.0-beta and adjusted the file
 $TOMCAT_HOME\conf\Catalina\localhost\apache-solr-4.0.0-BETA.xml to
 point the solr/home to C:/home/solr-4.0-beta. With this
 configuration, when I startup tomcat I got:
 
  SEVERE: org.apache.solr.common.SolrException: Invalid
  luceneMatchVersion
 'LUCENE_40', valid values are: [LUCENE_20, LUCENE_21, LUCENE_22,
 LUCENE_23, LUCENE_24, LUCENE_29, LUCENE_30, LUCENE_31, LUCENE_32,
 LUCENE_33, LUCENE_34, LUCENE_35, LUCENE_36, LUCENE_CURRENT ] or a string in 
 format 'VV'
 
  So I changed the line in solrconfig.xml:
 
  luceneMatchVersionLUCENE_40/luceneMatchVersion
 
  to
 
  luceneMatchVersionLUCENE_CURRENT/luceneMatchVersion
 
  So I got a new error:
 
  Caused by: java.lang.ClassNotFoundException:
 solr.NRTCachingDirectoryFactory
 
  This class is within the file apache-solr-core-4.0.0-BETA.jar but
  for
 some reason classloader of the class is not loaded. I then moved all
 jars in $TOMCAT_HOME\webapps\apache-solr-4.0.0-BETA\WEB-INF\lib to
 $TOMCAT_HOME\lib.
  After this setup, I got a new error:
 
  SEVERE: java.lang.ClassCastException:
 org.apache.solr.core.NRTCachingDirectoryFactory can not be cast to
 org.apache.solr.core.DirectoryFactory
 
  So I changed the line in solrconfig.xml:
 
  directoryFactory name=DirectoryFactory
 
 class=${solr.directoryFactory:solr.NRTCachingDirectoryFactory}/
 
  to
 
  directoryFactory name=DirectoryFactory
 
 class=${solr.directoryFactory:solr.NIOFSDirectoryFactory}/
 
  So I got a new error:
 
  Caused by: java.lang.ClassCastException:
 org.apache.solr.spelling.DirectSolrSpellChecker can not be cast to
 org.apache.solr.spelling.SolrSpellChecker
 
  How can I resolve the problem of classloader?
  How can I

Re: flush (delete all document) solr 4 Beta

2012-08-27 Thread Vadim Kisselmann
your docs are marked as deleted.
you should optimize after commit, then they will be really deleted.
it's easier and faster to stop your jetty/tomcat, drop your index
directory and start your servlet container...
when it's not possible, then optimize.
regards
Vadim


2012/8/27 Jamel ESSOUSSI jamel.essou...@gmail.com:
 Hi,

 I should flush solr (delete all existing documents)
 -- for doing this, I have the following code:


 HttpSolrServer server = HttpSolrServer(url);

 server.setSoTimeout(1000);
 server.setConnectionTimeout(100);
 server.setDefaultMaxConnectionsPerHost(100);
 server.setMaxTotalConnections(100);
 server.setFollowRedirects(false);
 server.setAllowCompression(true);
 server.setMaxRetries(1);
 server.setParser(new XMLResponseParser());

 UpdateResponse ur = server.deleteByQuery(*:*);

 server.commit(true, true);

 In the result, I hace already all document, -- the ur.getStatus() eq 0
 and the solr documents was not deleted

 -- I have'nt server or client errors

 Can you explain me why it did not work,

 Thinks











 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/flush-delete-all-document-solr-4-Beta-tp4003434.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Problem to start solr-4.0.0-BETA with tomcat-6.0.20

2012-08-24 Thread Vadim Kisselmann
a presumption:
do you use your old solrconfig.xml files from older installations?
when yes, compare the default config and yours.


2012/8/23 Claudio Ranieri claudio.rani...@estadao.com:
 I made this instalation on a new tomcat.
 With Solr 3.4.*, 3.5.*, 3.6.* works with jars into 
 $TOMCAT_HOME/webapps/solr/WEB-INF/lib, but with solr 4.0 beta doesn´t work. I 
 needed to add the jars into $TOMCAT_HOME/lib.
 The problem with the cast seems to be in the source code.


 -Mensagem original-
 De: Karthick Duraisamy Soundararaj [mailto:karthick.soundara...@gmail.com]
 Enviada em: quinta-feira, 23 de agosto de 2012 09:22
 Para: solr-user@lucene.apache.org
 Assunto: Re: Problem to start solr-4.0.0-BETA with tomcat-6.0.20

 Not sure if this can help. But once I had a similar problem with Solr 3.6.0 
 where tomcat refused to find one of the classes that existed. I deleted the 
 tomcat's webapp directory and then it worked fine.

 On Thu, Aug 23, 2012 at 8:19 AM, Erick Erickson 
 erickerick...@gmail.comwrote:

 First, I'm no Tomcat expert here's the Tomcat Solr page, but
 you've probably already seen it:
 http://wiki.apache.org/solr/SolrTomcat

 But I'm guessing that you may have old jars around somewhere and
 things are getting confused. I'd blow away the whole thing and start
 over, whenever I start copying jars around I always lose track of
 what's where.

 Have you successfully had any other Solr operate under Tomcat?

 Sorry I can't be more help
 Erick

 On Wed, Aug 22, 2012 at 9:47 AM, Claudio Ranieri
 claudio.rani...@estadao.com wrote:
  Hi,
 
  I tried to start the solr-4.0.0-BETA with tomcat-6.0.20 but does not
 work.
  I copied the apache-solr-4.0.0-BETA.war to $TOMCAT_HOME/webapps.
  Then I
 copied the directory apache-solr-4.0.0-BETA\example\solr to
 C:\home\solr-4.0-beta and adjusted the file
 $TOMCAT_HOME\conf\Catalina\localhost\apache-solr-4.0.0-BETA.xml to
 point the solr/home to C:/home/solr-4.0-beta. With this configuration,
 when I startup tomcat I got:
 
  SEVERE: org.apache.solr.common.SolrException: Invalid
  luceneMatchVersion
 'LUCENE_40', valid values are: [LUCENE_20, LUCENE_21, LUCENE_22,
 LUCENE_23, LUCENE_24, LUCENE_29, LUCENE_30, LUCENE_31, LUCENE_32,
 LUCENE_33, LUCENE_34, LUCENE_35, LUCENE_36, LUCENE_CURRENT ] or a string in 
 format 'VV'
 
  So I changed the line in solrconfig.xml:
 
  luceneMatchVersionLUCENE_40/luceneMatchVersion
 
  to
 
  luceneMatchVersionLUCENE_CURRENT/luceneMatchVersion
 
  So I got a new error:
 
  Caused by: java.lang.ClassNotFoundException:
 solr.NRTCachingDirectoryFactory
 
  This class is within the file apache-solr-core-4.0.0-BETA.jar but
  for
 some reason classloader of the class is not loaded. I then moved all
 jars in $TOMCAT_HOME\webapps\apache-solr-4.0.0-BETA\WEB-INF\lib to
 $TOMCAT_HOME\lib.
  After this setup, I got a new error:
 
  SEVERE: java.lang.ClassCastException:
 org.apache.solr.core.NRTCachingDirectoryFactory can not be cast to
 org.apache.solr.core.DirectoryFactory
 
  So I changed the line in solrconfig.xml:
 
  directoryFactory name=DirectoryFactory
 
 class=${solr.directoryFactory:solr.NRTCachingDirectoryFactory}/
 
  to
 
  directoryFactory name=DirectoryFactory
 
 class=${solr.directoryFactory:solr.NIOFSDirectoryFactory}/
 
  So I got a new error:
 
  Caused by: java.lang.ClassCastException:
 org.apache.solr.spelling.DirectSolrSpellChecker can not be cast to
 org.apache.solr.spelling.SolrSpellChecker
 
  How can I resolve the problem of classloader?
  How can I resolve the problem of cast of NRTCachingDirectoryFactory
  and
 DirectSolrSpellChecker?
  I can not startup the solr 4.0 beta with tomcat.
  Thanks,
 
 
 
 




 --
 --
 Karthick D S
 Master's in Computer Engineering ( Software Track ) Syracuse University 
 Syracuse - 13210 New York United States of America


Does SolrEntityProcessor fulfill my requirements?

2012-07-18 Thread Vadim Kisselmann
Hi folks,

i have this case:
i want to update my solr 4.0 from trunk to solr 4.0 alpha. the index
structure has changed, i can't replicate.
10 cores are in use, each with 30Mio docs. We assume that all fields
are stored and indexed.
What is the best way to export the docs from all cores on one machine
with solr 4.0trunk to same named cores on other machine with solr 4.0
alpha.
SolrEntityProcessor can be one solution, but does it work with this
size of data? I want reindex all docs at once and not in small
parts. I find no examples
of bigger reindex-attempts with SolrEntityProcessor.
Xslt as option two?
What were the best solution to do this, what do you think?

Best Regards
Vadim


Re: Pb installation Solr/Tomcat6

2012-07-14 Thread Vadim Kisselmann
same problem.
but here should tomcat6 have the right to read/write your index.
regards
vadim


2012/7/14 Bruno Mannina bmann...@free.fr:
 I found the problem I think, It was a permission problem on the schema.xml

 schema.xml was only readable by the solr user.

 Now I have the same problem with the solr index directory

 Le 14/07/2012 14:00, Bruno Mannina a écrit :

 Dear Solr users,

 I try to run solr/ with tomcat but I have always this error:
 Can't find resource 'schema.xml' in classpath or
 '/home/solr/apache-solr-3.6.0/example/solr/./conf/', cwd='/var/lib/tomcat6

 but schema.xml is inside the directory
 '/home/solr/apache-solr-3.6.0/example/solr/./conf/'

 http://localhost:8080/manager/html = works fine, I see Applications
 /solr, fonctionnelle True

 but when I click on solr/ (http://localhost:8080/solr/) I get this error.

 Could you help me to solve this problem, it makes me crazy.

 thanks a lot,
 Bruno


 Tomcat6
 Ubuntu 12.04
 Solr 3.6






Re: Trunk error in Tomcat

2012-07-12 Thread Vadim Kisselmann
it works, with a few changes :) I think we don't need a new issue in jura.

Solr 4.0 is no longer Solr 4.0 since late february.
There were some changes in solrconfig.xml in this time.
I migrate my solr 4.0 trunk-config, which works till late february in
a new config from 4.0 alpha.

A couple of changes which i noticed:
- abortOnConfigurationError:true is gone
- luceneMatchVersion was changed to LUCENE_50
- a couple of new jars included for velocity and lang
- new directory Factory = solr.directoryFactory:solr.NRTCachingDirectoryFactory
- indexDefaults replaced by indexConfig
- updateLog added
- replication Handler for SoldCloud added
- Names for handlers were changed like /select for search
- new Handler added  requestHandler name=/get
class=solr.RealTimeGetHandler
and so on...

 This AdminHandler-Exception is still there, when i use the
clusteringComponent, see here:
SCHWERWIEGEND: null:org.apache.solr.common.SolrException: Error
loading class 'solr.clustering.ClusteringComponent'

But if i comment it out, Solr starts without errors.
The paths to the clustering jar in ../contrib/clustering/lib/ is
correct and the needed jars are there, eventually we need new
jar-files?

Best regards
Vadim




2012/7/5 Stefan Matheis matheis.ste...@googlemail.com:
 Great, thanks Vadim



 On Thursday, July 5, 2012 at 9:34 AM, Vadim Kisselmann wrote:

 Hi Stefan,
 ok, i would test the latest version from trunk with tomcat in next
 days and open an new issue:)
 regards
 Vadim


 2012/7/3 Stefan Matheis matheis.ste...@googlemail.com 
 (mailto:matheis.ste...@googlemail.com):
  On Tuesday, July 3, 2012 at 8:10 PM, Vadim Kisselmann wrote:
   sorry, i overlooked your latest comment with the new issue in SOLR-3238 
   ;)
   Should i open an new issue?
 
 
 
 
  NP Vadim, yes a new Issue would help .. all available Information too :)




Re: Solr 4.0 IllegalStateException: this writer hit an OutOfMemoryError; cannot commit

2012-07-11 Thread Vadim Kisselmann
Hi Simon,
i checked my log files one more time to get the error timestamps.
I get the first Error at 14:37:

06.07.2012 14:37:52 org.apache.solr.common.SolrException log
SCHWERWIEGEND: null:ClientAbortException:  java.net.SocketException: Broken pipe
at 
org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:358)
at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:323)

Next one, and the first Java heap Space error at 17:35:
06.07.2012 17:35:36 org.apache.solr.common.SolrException log
SCHWERWIEGEND: null:java.lang.OutOfMemoryError: Java heap space
at 
org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray.init(FreqProxTermsWriterPerField.java:248)
at 
org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray.newInstance(FreqProxTermsWriterPerField.java:269)
at 
org.apache.lucene.index.ParallelPostingsArray.grow(ParallelPostingsArray.java:48)
at 
org.apache.lucene.index.TermsHashPerField$PostingsBytesStartArray.grow(TermsHashPerField.java:307)
at org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:330)

Commit failure a couple of seconds later:
06.07.2012 17:35:38 org.apache.solr.common.SolrException log
SCHWERWIEGEND: auto commit error...:java.lang.IllegalStateException:
this writer hit an OutOfMemoryError; cannot commit
at 
org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2650)

follow by 10 Java heap space Exceptions, and one minute later at 17:36
the first auto-warming Exception:
06.07.2012 17:36:26 org.apache.solr.common.SolrException log
SCHWERWIEGEND: Error during auto-warming of key:pubDate:[1340971496000
TO 1341576296000]:java.lang.OutOfMemoryError: Java heap space
06.07.2012 17:36:28 org.apache.solr.common.SolrException log
SCHWERWIEGEND: Error during auto-warming of key:pubDate:[1340971495000
TO 1341576295000]:java.lang.OutOfMemoryError: Java heap space

 it really seems that you are hitting an OOM during auto warming. can
 this be the case for your failure.
 Can you raise the JVM memory and see if you still hit the spike and go
 OOM? this is very unlikely a IndexWriter problem. I'd rather look at
 your warmup queries ie. fieldcache, FieldValueCache usage. Are you
 sorting / facet on anything?

Auto warming problems began one minute after the java
heap-exceptions, so i think this are subsequent problems.
I configured very small caches(max. sizes between 512 and 2048) for my use case.
Warming queries looks like this, with sorting, but without facetting:
lst
   str name=qag/str
   str name=fqpubDate:[NOW-1DAY TO *]/str
   str name=sortpubDate desc/str
/lst

Du you think that 8GB for JVM are not enough? To raise the JVM memory
can solve the problem..
As mentioned, this server runs a long time with the same config
without problems, i am surprised that this problem was there at one
time
without heavy usage...now it's running smoothly again after restart
yesterday, so i don't know whet the problem appears again.

I try to update to 4.0 alpha today and run it with tomcat and report:)

Best regards
Vadim





2012/7/10 Simon Willnauer simon.willna...@gmail.com:
 it really seems that you are hitting an OOM during auto warming. can
 this be the case for your failure.
 Can you raise the JVM memory and see if you still hit the spike and go
 OOM? this is very unlikely a IndexWriter problem. I'd rather look at
 your warmup queries ie. fieldcache, FieldValueCache usage. Are you
 sorting / facet on anything?

 simon

 On Tue, Jul 10, 2012 at 4:49 PM, Vadim Kisselmann
 v.kisselm...@gmail.com wrote:
 Hi Robert,

 Can you run Lucene's checkIndex tool on your index?

 No, unfortunately not. This Solr should run without stoppage, an
 tomcat-restart is ok, but not more:)
 I tested newer trunk-versions a couple of months ago, but they fail
 all with tomcat.
 i would test 4.0-alpha in next days with tomcat and open an jira-issue
 if it doesn't work with it.

 do you have another exception in your logs? To my knowledge, in all
 cases that IndexWriter throws an OutOfMemoryError, the original
 OutOfMemoryError is also rethrown (not just this IllegalStateException
 noting that at some point, it hit OOM.

 Hmm, i checked older logs and found something new, what i have not
 seen in VisualVM. Java heap space-Problems, just before OOM.
 My JVM has 8GB -Xmx/-Xms, 16GB for OS, nothing else on this machine.
 This Errors pop up's during normal run according logs, no optimizes,
 high loads(max. 30 queries per minute) or something special at this time.

 SCHWERWIEGEND: null:ClientAbortException:  java.net.SocketException: Broken 
 pipe
 SCHWERWIEGEND: null:java.lang.OutOfMemoryError: Java heap space
 SCHWERWIEGEND: auto commit error...:java.lang.IllegalStateException:
 this writer hit an OutOfMemoryError; cannot commit
 SCHWERWIEGEND: Error during auto-warming of
 key:org.apache.solr.search.QueryResultKey@7cba935e:java.lang.OutOfMemoryError:
 Java

Solr 4.0 IllegalStateException: this writer hit an OutOfMemoryError; cannot commit

2012-07-10 Thread Vadim Kisselmann
Hi folks,
my Test-Server with Solr 4.0 from trunk(version 1292064 from late
february) throws this exception...


auto commit error...:java.lang.IllegalStateException: this writer hit
an OutOfMemoryError; cannot commit
at 
org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2650)
at 
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2804)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2786)
at 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:391)
at org.apache.solr.update.CommitTracker.run(CommitTracker.java:197)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)


My Server has 24GB RAM, 8GB for JVM. I index round about 20 docs per
seconds, my index is small with 10Mio docs. It runs
about a couple of weeks and then suddenly i get this errors..
I can't see any problems in VisualVM with my GC. It's all ok, memory
consumption is about 6GB, no swapping, no i/o problems..it's all
green:)
What's going on on this machine?:)  My uncommitted docs are gone, right?

Best regards
Vadim


Re: Solr 4.0 IllegalStateException: this writer hit an OutOfMemoryError; cannot commit

2012-07-10 Thread Vadim Kisselmann
Hi Robert,

 Can you run Lucene's checkIndex tool on your index?

No, unfortunately not. This Solr should run without stoppage, an
tomcat-restart is ok, but not more:)
I tested newer trunk-versions a couple of months ago, but they fail
all with tomcat.
i would test 4.0-alpha in next days with tomcat and open an jira-issue
if it doesn't work with it.

 do you have another exception in your logs? To my knowledge, in all
 cases that IndexWriter throws an OutOfMemoryError, the original
 OutOfMemoryError is also rethrown (not just this IllegalStateException
 noting that at some point, it hit OOM.

Hmm, i checked older logs and found something new, what i have not
seen in VisualVM. Java heap space-Problems, just before OOM.
My JVM has 8GB -Xmx/-Xms, 16GB for OS, nothing else on this machine.
This Errors pop up's during normal run according logs, no optimizes,
high loads(max. 30 queries per minute) or something special at this time.

SCHWERWIEGEND: null:ClientAbortException:  java.net.SocketException: Broken pipe
SCHWERWIEGEND: null:java.lang.OutOfMemoryError: Java heap space
SCHWERWIEGEND: auto commit error...:java.lang.IllegalStateException:
this writer hit an OutOfMemoryError; cannot commit
SCHWERWIEGEND: Error during auto-warming of
key:org.apache.solr.search.QueryResultKey@7cba935e:java.lang.OutOfMemoryError:
Java heap space
SCHWERWIEGEND: org.apache.solr.common.SolrException: Internal Server Error
SCHWERWIEGEND: null:org.apache.solr.common.SolrException: Internal Server Error

I knew this failures when i work on virtual machines with solr 1.4,
big indexes and ridiculous small -Xmx sizes.
But on real hardware, with enough RAM, fast disks/cpu's it's new for me:)

Best regards
Vadim


Re: Trunk error in Tomcat

2012-07-05 Thread Vadim Kisselmann
Hi Stefan,
ok, i would test the latest version from trunk with tomcat in next
days and open an new issue:)
regards
Vadim


2012/7/3 Stefan Matheis matheis.ste...@googlemail.com:
 On Tuesday, July 3, 2012 at 8:10 PM, Vadim Kisselmann wrote:
 sorry, i overlooked your latest comment with the new issue in SOLR-3238 ;)
 Should i open an new issue?


 NP Vadim, yes a new Issue would help .. all available Information too :)


Re: Trunk error in Tomcat

2012-07-03 Thread Vadim Kisselmann
same problem here:

https://mail.google.com/mail/u/0/?ui=2view=btopver=18zqbez0n5t35q=tomcat%20v.kisselmannqs=truesearch=queryth=13615cfb9a5064bdqt=kisselmann.1.tomcat.1.tomcat's.1.v.1cvid=3


https://issues.apache.org/jira/browse/SOLR-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13230056#comment-13230056

i use an older solr-trunk version from february/march, it works. with
newer versions from trunk i get the same error: This interface
requires that you activate the admin request handlers...

regards
vadim



2012/7/3 Briggs Thompson w.briggs.thomp...@gmail.com:
 Also, I forgot to include this before, but there is a client side error
 which is a failed 404 request to the below URL.

 http://localhost:8983/solr/null/admin/system?wt=json

 On Tue, Jul 3, 2012 at 8:45 AM, Briggs Thompson w.briggs.thomp...@gmail.com
 wrote:

 Thanks Erik. If anyone else has any ideas about the NoSuchFieldError issue
 please let me know. Thanks!

 -Briggs


 On Mon, Jul 2, 2012 at 6:27 PM, Erik Hatcher erik.hatc...@gmail.comwrote:

 Interestingly, I just logged the issue of it not showing the right error
 in the UI here: https://issues.apache.org/jira/browse/SOLR-3591

 As for your specific issue, not sure, but the error should at least also
 show in the admin view.

 Erik


 On Jul 2, 2012, at 18:59 , Briggs Thompson wrote:

  Hi All,
 
  I just grabbed the latest version of trunk and am having a hard time
  getting it running properly in tomcat. It does work fine in Jetty. The
  admin screen gives the following error:
  This interface requires that you activate the admin request handlers,
 add
  the following configuration to your  Solrconfig.xml
 
  I am pretty certain the front end error has nothing to do with the
 actual
  error. I have seen some other folks on the distro with the same problem,
  but none of the threads have a solution (that I could find). Below is
 the
  stack trace. I also tried with different versions of Lucene but none
  worked. Note: my index is EMPTY and I am not migrating over an index
 build
  with a previous version of lucene. I think I ran into this a while ago
 with
  an earlier version of trunk, but I don't recall doing anything to fix
 it.
  Anyhow, if anyone has an idea with this one, please let me know.
 
  Thanks!
  Briggs Thompson
 
  SEVERE: null:java.lang.NoSuchFieldError: LUCENE_50
  at
 
 org.apache.solr.analysis.SynonymFilterFactory$1.createComponents(SynonymFilterFactory.java:83)
  at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:83)
  at
 
 org.apache.lucene.analysis.synonym.SynonymMap$Builder.analyze(SynonymMap.java:120)
  at
 
 org.apache.lucene.analysis.synonym.SolrSynonymParser.addInternal(SolrSynonymParser.java:99)
  at
 
 org.apache.lucene.analysis.synonym.SolrSynonymParser.add(SolrSynonymParser.java:70)
  at
 
 org.apache.solr.analysis.SynonymFilterFactory.loadSolrSynonyms(SynonymFilterFactory.java:131)
  at
 
 org.apache.solr.analysis.SynonymFilterFactory.inform(SynonymFilterFactory.java:93)
  at
 
 org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:584)
  at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:112)
  at org.apache.solr.core.CoreContainer.create(CoreContainer.java:812)
  at org.apache.solr.core.CoreContainer.load(CoreContainer.java:510)
  at org.apache.solr.core.CoreContainer.load(CoreContainer.java:333)
  at
 
 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:282)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:101)
  at
 
 org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:277)
  at
 
 org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:258)
  at
 
 org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:382)
  at
 
 org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:103)
  at
 
 org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4649)
  at
 
 org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5305)
  at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
  at
 
 org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:899)
  at
 org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:875)
  at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:618)
  at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:963)
  at
 
 org.apache.catalina.startup.HostConfig$DeployWar.run(HostConfig.java:1600)
  at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  at
 
 

Re: Trunk error in Tomcat

2012-07-03 Thread Vadim Kisselmann
Hi Stefan,
sorry, i overlooked your latest comment with the new issue in SOLR-3238 ;)
Should i open an new issue? I´m not testing it with newer
trunk-versions about a couple of months because
solr cloud with an external ZK and tomcat fails too, but i can do it
and post all the errors which i find in my log files.
Regards
Vadim



2012/7/3 Stefan Matheis matheis.ste...@googlemail.com:
 Hey Vadim

 Right now JIRA is Down for Maintenance, but afaik there was another comment 
 asking for more informations. I'll check Eric's Issue today or tomorrow and 
 see how we can handle (and hopefully fix) that.

 Regards
 Stefan


 On Tuesday, July 3, 2012 at 4:00 PM, Vadim Kisselmann wrote:

 same problem here:

 https://mail.google.com/mail/u/0/?ui=2view=btopver=18zqbez0n5t35q=tomcat%20v.kisselmannqs=truesearch=queryth=13615cfb9a5064bdqt=kisselmann.1.tomcat.1.tomcat's.1.v.1cvid=3


 https://issues.apache.org/jira/browse/SOLR-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13230056#comment-13230056

 i use an older solr-trunk version from february/march, it works. with
 newer versions from trunk i get the same error: This interface
 requires that you activate the admin request handlers...

 regards
 vadim





Re: Dismax Question

2012-07-02 Thread Vadim Kisselmann
in your schema.xml you can set the default query parser operator, in
your case solrQueryParser defaultOperator=AND/, but it's
deprecated.
When you use the edismax, read this:http://drupal.org/node/1559394 .
mm-param is here the answer.

Best regards
Vadim





2012/7/2 Steve Fatula compconsult...@yahoo.com:
 Let's say a user types in:

 DualHead2Go


 The way solr is working, it splits this into:

 Dual Head 2 Go

 And searches the index for various fields, and finds records where any ONE of 
 them matches.

 Now, if I simply type the search terms Dual Head 2 Go, it finds records where 
 ALL of them match. This is because we set q.op to AND.

 Recently, we went from Solr 3.4 to 3.6, and, 3.4 used to work ok, 3.6 seems o 
 behave differently, or, perhaps we mucked something up.

 So, my question is how do we get Solr search to work with AND when it is 
 splitting words? The splitting part is good, the bad part is that it is 
 searching for any one of those split words.

 Steve


Solr 1.4, slaves hang after replication from an just optimized master

2012-06-19 Thread Vadim Kisselmann
Hi folks,

i have to look for an old live system with solr 1.4.
When i optimize an bigger index with round about 200GB(after optimize
and cut, 100GB) and my slaves
replicate the newest version after(!) optimize, they hang(all) with
100% in replication and they have at once circa 300GB index sizes.
After a couple of seconds i have to restart my Tomcat, because the
slaves are no longer be able to response on queries...
Ironically, they have the same number of segments like master, i can't
see errors in my logfile and the server load is normal.
What's wrong here? :)
Normal HTTP Replication is used, this params are set on master:
str name=replicateAftercommit/str
str name=replicateAfterstartup/str
str name=replicateAfteroptimize/str

Any ideas?

Best regards
Vadim


Re: Solr 1.4, slaves hang after replication from an just optimized master

2012-06-19 Thread Vadim Kisselmann
Forget to mention:
After Tomcat-restart, the slaves still have an index with 300GB.
After an manual replication command in UI, 100GB like master in a
couple of seconds and all is ok.



2012/6/19 Vadim Kisselmann v.kisselm...@googlemail.com:
 Hi folks,

 i have to look for an old live system with solr 1.4.
 When i optimize an bigger index with round about 200GB(after optimize
 and cut, 100GB) and my slaves
 replicate the newest version after(!) optimize, they hang(all) with
 100% in replication and they have at once circa 300GB index sizes.
 After a couple of seconds i have to restart my Tomcat, because the
 slaves are no longer be able to response on queries...
 Ironically, they have the same number of segments like master, i can't
 see errors in my logfile and the server load is normal.
 What's wrong here? :)
 Normal HTTP Replication is used, this params are set on master:
    str name=replicateAftercommit/str
    str name=replicateAfterstartup/str
    str name=replicateAfteroptimize/str

 Any ideas?

 Best regards
 Vadim


Re: Poll: What do you use for Solr performance monitoring?

2012-05-31 Thread Vadim Kisselmann
Hi Otis,
done :) Till now we use Graphite, Ganglia and Zabbix. For our JVM
monitoring JStatsD.
Best regards
Vadim


2012/5/31 Otis Gospodnetic otis_gospodne...@yahoo.com:
 Hi,

 Super quick poll:  What do you use for Solr performance monitoring?
 Vote here: 
 http://blog.sematext.com/2012/05/30/poll-what-do-you-use-for-solr-performance-monitoring/


 I'm collecting data for my Berlin Buzzwords talk that will touch on Solr, so 
 your votes will be greatly appreciated!

 Thanks,
 Otis


Re: Weird query results with edismax and boolean operator +

2012-04-30 Thread Vadim Kisselmann
Hi Jan,
thanks for your response!

My qf parameter for edismax is: title. My
defaultSearchField=text in schema.xml.
In my app i generate a query with qf=title,text, so i think the
default parameters in config/schema should bei overridden, right?

I found eventually 2 reasons for this behavior.
1. mm-parameter in solrconfig.xml for edismax is 0. 0 stands for
OR, but it should be an AND = 100%.
2. I suppose that my app does not override my default-qf.
I test it today and report, with my parsed query and all params.

Best regards
Vadim




2012/4/29 Jan Høydahl jan@cominvent.com:
 Hi,

 What is your qf parameter?
 Can you run the three queries with debugQuery=trueechoParams=all and attach 
 parsed query and all params? It will probably explain what is happening.

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr Training - www.solrtraining.com

 On 27. apr. 2012, at 11:21, Vadim Kisselmann wrote:

 Hi folks,

 i use solr 4.0 from trunk, and edismax as standard query handler.
 In my schema i defined this:  solrQueryParser defaultOperator=AND/

 I have this simple problem:

 nascar +author:serg* (3500 matches)

 +nascar +author:serg* (1 match)

 nascar author:serg* (5200 matches)

 nascar  AND author:serg* (1 match)

 I think i understand the query syntax, but this behavior confused me.
 Why this match-differences?

 By the way, i get in all matches at least one of my terms.
 But not always both.

 Best regards
 Vadim



Re: Weird query results with edismax and boolean operator +

2012-04-30 Thread Vadim Kisselmann
I tested it.
With default qf=title text in solrconfig and mm=100%
i get the same result(1) for nascar AND author:serg* and +nascar
+author:serg*, great.
With nascar +author:serg* i get 3500 matches, in this case the
mm-parameter seems not to work.

Here are my debug params for nascar AND author:serg*:

/strstr name=querystringnascar AND author:serg*/str
str name=parsedquery(+(+DisjunctionMaxQuery((text:nascar |
title:nascar)~0.01) +author:serg*))/no_coord/str
str name=parsedquery_toString+(+(text:nascar | title:nascar)~0.01
+author:serg*)/strlst name=explainstr
name=com.bostonherald/news/international/europe/view/20120409russia_allows_anti-putin_demonstration_in_red_square
8.235954 = (MATCH) sum of:
  8.10929 = (MATCH) max plus 0.01 times others of:
8.031613 = (MATCH) weight(text:nascar in 0) [DefaultSimilarity], result of:
  8.031613 = score(doc=0,freq=2.0 = termFreq=2.0
), product of:
0.84814763 = queryWeight, product of:
  6.6960144 = idf(docFreq=27, maxDocs=8335)
  0.12666455 = queryNorm
9.469594 = fieldWeight in 0, product of:
  1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
  6.6960144 = idf(docFreq=27, maxDocs=8335)
  1.0 = fieldNorm(doc=0)
7.7676363 = (MATCH) weight(title:nascar in 0) [DefaultSimilarity],
result of:
  7.7676363 = score(doc=0,freq=1.0 = termFreq=1.0
), product of:
0.9919093 = queryWeight, product of:
  7.830994 = idf(docFreq=8, maxDocs=8335)
  0.12666455 = queryNorm
7.830994 = fieldWeight in 0, product of:
  1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
  7.830994 = idf(docFreq=8, maxDocs=8335)
  1.0 = fieldNorm(doc=0)
  0.12666455 = (MATCH) ConstantScore(author:serg*), product of:
1.0 = boost
0.12666455 = queryNorm
/str/lst


And here for  nascar +author:serg*:
str name=querystringnascar +author:serg*/str
str name=parsedquery(+(DisjunctionMaxQuery((text:nascar |
title:nascar)~0.01) +author:serg*))/no_coord/str
str name=parsedquery_toString+((text:nascar | title:nascar)~0.01
+author:serg*)/strlst name=explainstr
name=com.bostonherald/news/international/europe/view/20120409russia_allows_anti-putin_demonstration_in_red_square
8.235954 = (MATCH) sum of:
  8.10929 = (MATCH) max plus 0.01 times others of:
8.031613 = (MATCH) weight(text:nascar in 0) [DefaultSimilarity], result of:
  8.031613 = score(doc=0,freq=2.0 = termFreq=2.0
), product of:
0.84814763 = queryWeight, product of:
  6.6960144 = idf(docFreq=27, maxDocs=8335)
  0.12666455 = queryNorm
9.469594 = fieldWeight in 0, product of:
  1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
  6.6960144 = idf(docFreq=27, maxDocs=8335)
  1.0 = fieldNorm(doc=0)
7.7676363 = (MATCH) weight(title:nascar in 0) [DefaultSimilarity],
result of:
  7.7676363 = score(doc=0,freq=1.0 = termFreq=1.0
), product of:
0.9919093 = queryWeight, product of:
  7.830994 = idf(docFreq=8, maxDocs=8335)
  0.12666455 = queryNorm
7.830994 = fieldWeight in 0, product of:
  1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
  7.830994 = idf(docFreq=8, maxDocs=8335)
  1.0 = fieldNorm(doc=0)
  0.12666455 = (MATCH) ConstantScore(author:serg*), product of:
1.0 = boost
0.12666455 = queryNorm
/str
str name=mx.com.elsiglodetorreon/noticia/727525.sacerdotas.html
0.063332275 = (MATCH) product of:
  0.12666455 = (MATCH) sum of:
0.12666455 = (MATCH) ConstantScore(author:serg*), product of:
  1.0 = boost
  0.12666455 = queryNorm
  0.5 = coord(1/2)
/str


You can see, that for first doc in nascar +author:serg* all
query-params match, but in the second doc only
ConstantScore(author:serg*).
But with an mm=100% all query-params should match.
http://www.lucidimagination.com/blog/2010/05/23/whats-a-dismax/
http://lucene.apache.org/solr/api/org/apache/solr/util/doc-files/min-should-match.html

Best regards
Vadim



2012/4/30 Vadim Kisselmann v.kisselm...@googlemail.com:
 Hi Jan,
 thanks for your response!

 My qf parameter for edismax is: title. My
 defaultSearchField=text in schema.xml.
 In my app i generate a query with qf=title,text, so i think the
 default parameters in config/schema should bei overridden, right?

 I found eventually 2 reasons for this behavior.
 1. mm-parameter in solrconfig.xml for edismax is 0. 0 stands for
 OR, but it should be an AND = 100%.
 2. I suppose that my app does not override my default-qf.
 I test it today and report, with my parsed query and all params.

 Best regards
 Vadim




 2012/4/29 Jan Høydahl jan@cominvent.com:
 Hi,

 What is your qf parameter?
 Can you run the three queries with debugQuery=trueechoParams=all and attach 
 parsed query and all params? It will probably explain what is happening.

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr

Weird query results with edismax and boolean operator +

2012-04-27 Thread Vadim Kisselmann
Hi folks,

i use solr 4.0 from trunk, and edismax as standard query handler.
In my schema i defined this:  solrQueryParser defaultOperator=AND/

I have this simple problem:

 nascar +author:serg* (3500 matches)

 +nascar +author:serg* (1 match)

 nascar author:serg* (5200 matches)

 nascar  AND author:serg* (1 match)

I think i understand the query syntax, but this behavior confused me.
Why this match-differences?

By the way, i get in all matches at least one of my terms.
But not always both.

Best regards
Vadim


Re: Master config

2012-04-27 Thread Vadim Kisselmann
hi,
when only the slaves are used for search, why not, more RAM for OS.
I keep my default settings on my master, because of when my slaves are
busy with client-queries,
i can test a few things on my master.

best regards
vadim



2012/4/27 Jamel ESSOUSSI jamel.essou...@gmail.com:
 Hi,

 I use two Solr slaves and one Solr master, it's a good idea to disable all
 the the caches in the master ?

 Best Regards

 -- Jamel ESSOUSSI

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Master-config-tp3943648p3943648.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Localize the largest fields (content) in index

2012-03-29 Thread Vadim Kisselmann
Hi Erick,
thanks:)
The admin UI give me the counts, so i can identify fields with big
bulks of unique terms.
I known this wiki-page, but i read it one more time.
List of my file extensions with size in GB(Index size ~150GB):
tvf 90GB
fdt 30GB
tim 18GB
prx 15GB
frq 12GB
tip 200MB
tvx 150MB

tvf is my biggest file extension.
Wiki :This file contains, for each field that has a term vector
stored, a list of the terms, their frequencies and, optionally,
position and offest information.

Hmm, i use termVectors on my biggest fields because of MLT and Highlighting.
But i think i should test my performance without termVectors. Good Idea? :)

What do you think about my file extension sizes?

Best regards
Vadim




2012/3/29 Erick Erickson erickerick...@gmail.com:
 The admin UI (schema browser) will give you the counts of unique terms
 in your fields, which is where I'd start.

 I suspect you've already seen this page, but if not:
 http://lucene.apache.org/java/3_5_0/fileformats.html#file-names
 the .fdt and .fdx file extensions are where data goes when
 you set 'stored=true '. These files don't affect search speed,
 they just contain the verbatim copy of the data.

 The relative sizes of the various files above should give
 you a hint as to what's using the most space, but it'll be a bit
 of a hunt for you to pinpoint what's actually up. TermVectors
 and norms are often sources of using up space.

 Best
 Erick

 On Wed, Mar 28, 2012 at 10:55 AM, Vadim Kisselmann
 v.kisselm...@googlemail.com wrote:
 Hello folks,

 i work with Solr 4.0 r1292064 from trunk.
 My index grows fast, with 10Mio. docs i get an index size of 150GB
 (25% stored, 75% indexed).
 I want to find out, which fields(content) are too large, to consider 
 measures.

 How can i localize/discover the largest fields in my index?
 Luke(latest from trunk) doesn't work
 with my Solr version. I build Lucene/Solr .jars and tried to feed Luke
 this these, but i get many errors
 and can't build it.

 What other options do i have?

 Thanks and best regards
 Vadim


Re: Localize the largest fields (content) in index

2012-03-29 Thread Vadim Kisselmann
Yes, i think so, too :)
MLT doesn´t need termVectors really, but it´s faster with them. I
found out, what
MLT works better on the title field in my case, instead of big text fields.

Sharding is in planning, but my setup with SolrCloud, ZK and Tomcat
doesn´t work,
see here: 
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201203.mbox/%3CCA+GXEZE3LCTtgXFzn9uEdRxMymGF=z0ujb9s8b0qkipafn6...@mail.gmail.com%3E
I split my huge index (150GB-index in this case is my test-index), and
want use SolrCloud,
but it´s not runnable with tomcat at this time.

Best regards
Vadim


2012/3/29 Erick Erickson erickerick...@gmail.com:
 Yeah, it's worth a try. The term vectors aren't entirely necessary for
 highlighting,
 although they do make things more efficient.

 As far as MLT, does MLT really need such a big field?

 But you may be on your way to sharding your index if you remove this info
 and testing shows problems

 Best
 Erick

 On Thu, Mar 29, 2012 at 9:32 AM, Vadim Kisselmann
 v.kisselm...@googlemail.com wrote:
 Hi Erick,
 thanks:)
 The admin UI give me the counts, so i can identify fields with big
 bulks of unique terms.
 I known this wiki-page, but i read it one more time.
 List of my file extensions with size in GB(Index size ~150GB):
 tvf 90GB
 fdt 30GB
 tim 18GB
 prx 15GB
 frq 12GB
 tip 200MB
 tvx 150MB

 tvf is my biggest file extension.
 Wiki :This file contains, for each field that has a term vector
 stored, a list of the terms, their frequencies and, optionally,
 position and offest information.

 Hmm, i use termVectors on my biggest fields because of MLT and Highlighting.
 But i think i should test my performance without termVectors. Good Idea? :)

 What do you think about my file extension sizes?

 Best regards
 Vadim




 2012/3/29 Erick Erickson erickerick...@gmail.com:
 The admin UI (schema browser) will give you the counts of unique terms
 in your fields, which is where I'd start.

 I suspect you've already seen this page, but if not:
 http://lucene.apache.org/java/3_5_0/fileformats.html#file-names
 the .fdt and .fdx file extensions are where data goes when
 you set 'stored=true '. These files don't affect search speed,
 they just contain the verbatim copy of the data.

 The relative sizes of the various files above should give
 you a hint as to what's using the most space, but it'll be a bit
 of a hunt for you to pinpoint what's actually up. TermVectors
 and norms are often sources of using up space.

 Best
 Erick

 On Wed, Mar 28, 2012 at 10:55 AM, Vadim Kisselmann
 v.kisselm...@googlemail.com wrote:
 Hello folks,

 i work with Solr 4.0 r1292064 from trunk.
 My index grows fast, with 10Mio. docs i get an index size of 150GB
 (25% stored, 75% indexed).
 I want to find out, which fields(content) are too large, to consider 
 measures.

 How can i localize/discover the largest fields in my index?
 Luke(latest from trunk) doesn't work
 with my Solr version. I build Lucene/Solr .jars and tried to feed Luke
 this these, but i get many errors
 and can't build it.

 What other options do i have?

 Thanks and best regards
 Vadim


Localize the largest fields (content) in index

2012-03-28 Thread Vadim Kisselmann
Hello folks,

i work with Solr 4.0 r1292064 from trunk.
My index grows fast, with 10Mio. docs i get an index size of 150GB
(25% stored, 75% indexed).
I want to find out, which fields(content) are too large, to consider measures.

How can i localize/discover the largest fields in my index?
Luke(latest from trunk) doesn't work
with my Solr version. I build Lucene/Solr .jars and tried to feed Luke
this these, but i get many errors
and can't build it.

What other options do i have?

Thanks and best regards
Vadim


Re: SolrCloud with Tomcat and external Zookeeper, does it work?

2012-03-28 Thread Vadim Kisselmann
Hi Jerry,
thanks for your response:)
This thread(SolrCloud new...) is new for me, thanks!
How far are you with your setup? Which problems/errors du you have?
Best regards
Vadim




2012/3/27 jerry.min...@gmail.com jerry.min...@gmail.com:
 Hi Vadim,

 I too am experimenting with SolrCloud and need help with setting it up
 using Tomcat as the java servlet container.
 While searching for help on this question, I found another thread in
 the solr-mailing-list that is helpful.
 In case you haven't seen this thread that I found, please search the
 solr-mailing-list for: SolrCloud new
 You can also view it at nabble using this link:
 http://lucene.472066.n3.nabble.com/SolrCloud-new-td1528872.html

 Best,
 Jerry M.




 On Wed, Mar 21, 2012 at 5:51 AM, Vadim Kisselmann
 v.kisselm...@googlemail.com wrote:

 Hello folks,

 i read the SolrCloud Wiki and Bruno Dumon's blog entry with his First
 Exploration of SolrCloud.
 Examples and a first setup with embedded Jetty and ZK WORKS without problems.

 I tried to setup my own configuration with Tomcat and an external
 Zookeeper(my Master-ZK), but it doesn't work really.

 My setup:
 - latest Solr version from trunk
 - Tomcat 6
 - external ZK
 - Target: 1 Server, 1 Tomcat, 1 Solr instance, 2 collections with
 different config/schema

 What i tried:
 --
 1. After checkout i build solr(ant run-example), it works.
 ---
 2. I send my config/schema files to external ZK with Jetty:
 java -Djetty.port=8080 -Dbootstrap_confdir=/root/solrCloud/conf/
 -Dcollection.configName=conf1 -DzkHost=master-zk:2181 -jar start.jar
 it works, too.
 ---
 3. I create my (empty, without cores)solr.xml, like Bruno:
 http://www.ngdata.com/site/blog/57-ng.html#disqus_thread
 ---
 4. I started my Tomcat, and get the first error:
 in UI: This interface requires that you activate the admin request
 handlers, add the following configuration to your solrconfig.xml:
 !-- Admin Handlers - This will register all the standard admin
 RequestHandlers. --
 requestHandler name=/admin/ class=solr.admin.AdminHandlers /
 Admin request Handlers are definitely activated in my solrconfig.

 I get this error only with the latest trunk versions, with r1292064
 from February not. Sometimes it works with the new version, sometimes
 not and i get this error.

 --
 5. Ok , it it works, after few restarts, i changed my JAVA_OPTS for
 Tomcat and added this: -DzkHost=master-zk:2181
 Next Error:
 This The web application [/solr2] appears to have started a thread
 named [main-SendThread(master-zk:2181)] but has failed to stop it.
 This is very likely to create a memory leak.
 Exception in thread Thread-2 java.lang.NullPointerException
 at 
 org.apache.solr.cloud.Overseer$CloudStateUpdater.amILeader(Overseer.java:179)
 at org.apache.solr.cloud.Overseer$CloudStateUpdater.run(Overseer.java:104)
 at java.lang.Thread.run(Thread.java:662)
 15.03.2012 13:25:17 org.apache.catalina.loader.WebappClassLoader loadClass
 INFO: Illegal access: this web application instance has been stopped
 already. Could not load org.apache.zookeeper.server.ZooTrace. The
 eventual following stack trace is caused by an error thrown for
 debugging purposes as well as to attempt to terminate the thread which
 caused the illegal access, and has no functional impact.
 java.lang.IllegalStateException
 at 
 org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1531)
 at 
 org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1491)
 at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1196)
 15.03.2012 13:25:17 org.apache.coyote.http11.Http11Protocol destroy

 -
 6. Ok, we assume, that the first steps works, and i would create new
 cores and my 2 collections. My requests with CoreAdminHandler are ok,
 my solr.xml looks like this:
 ?xml version=1.0 encoding=UTF-8 ?
 solr persistent=true
  cores adminPath=/admin/cores zkClientTimeout=1 hostPort=8080
 hostContext=solr
    core
       name=shard1_data
       collection=col1
       shard=shard1
       instanceDir=xxx/ /
  core
       name=shard2_data
       collection=col2
       shard=shard2
       instanceDir=xx2/ /
  /cores
 /solr

 Now i get the following exception: ...couldn't find conf name for
 collection1...
 I don't have an collection 1. Why this exception?

 ---
 You can see, there are too many exceptions and eventually
 configuration problems with Tomcat and an external ZK.
 Has anyone set up an identical configuration and does it work?
 Does anyone detect mistakes in my configuration steps?

 Best regards
 Vadim


SolrCloud with Tomcat and external Zookeeper, does it work?

2012-03-21 Thread Vadim Kisselmann
Hello folks,

i read the SolrCloud Wiki and Bruno Dumon's blog entry with his First
Exploration of SolrCloud.
Examples and a first setup with embedded Jetty and ZK WORKS without problems.

I tried to setup my own configuration with Tomcat and an external
Zookeeper(my Master-ZK), but it doesn't work really.

My setup:
- latest Solr version from trunk
- Tomcat 6
- external ZK
- Target: 1 Server, 1 Tomcat, 1 Solr instance, 2 collections with
different config/schema

What i tried:
--
1. After checkout i build solr(ant run-example), it works.
---
2. I send my config/schema files to external ZK with Jetty:
java -Djetty.port=8080 -Dbootstrap_confdir=/root/solrCloud/conf/
-Dcollection.configName=conf1 -DzkHost=master-zk:2181 -jar start.jar
it works, too.
---
3. I create my (empty, without cores)solr.xml, like Bruno:
http://www.ngdata.com/site/blog/57-ng.html#disqus_thread
---
4. I started my Tomcat, and get the first error:
in UI: This interface requires that you activate the admin request
handlers, add the following configuration to your solrconfig.xml:
!-- Admin Handlers - This will register all the standard admin
RequestHandlers. --
requestHandler name=/admin/ class=solr.admin.AdminHandlers /
Admin request Handlers are definitely activated in my solrconfig.

I get this error only with the latest trunk versions, with r1292064
from February not. Sometimes it works with the new version, sometimes
not and i get this error.

--
5. Ok , it it works, after few restarts, i changed my JAVA_OPTS for
Tomcat and added this: -DzkHost=master-zk:2181
Next Error:
This The web application [/solr2] appears to have started a thread
named [main-SendThread(master-zk:2181)] but has failed to stop it.
This is very likely to create a memory leak.
Exception in thread Thread-2 java.lang.NullPointerException
at org.apache.solr.cloud.Overseer$CloudStateUpdater.amILeader(Overseer.java:179)
at org.apache.solr.cloud.Overseer$CloudStateUpdater.run(Overseer.java:104)
at java.lang.Thread.run(Thread.java:662)
15.03.2012 13:25:17 org.apache.catalina.loader.WebappClassLoader loadClass
INFO: Illegal access: this web application instance has been stopped
already. Could not load org.apache.zookeeper.server.ZooTrace. The
eventual following stack trace is caused by an error thrown for
debugging purposes as well as to attempt to terminate the thread which
caused the illegal access, and has no functional impact.
java.lang.IllegalStateException
at 
org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1531)
at 
org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1491)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1196)
15.03.2012 13:25:17 org.apache.coyote.http11.Http11Protocol destroy

-
6. Ok, we assume, that the first steps works, and i would create new
cores and my 2 collections. My requests with CoreAdminHandler are ok,
my solr.xml looks like this:
?xml version=1.0 encoding=UTF-8 ?
solr persistent=true
  cores adminPath=/admin/cores zkClientTimeout=1 hostPort=8080
hostContext=solr
core
   name=shard1_data
   collection=col1
   shard=shard1
   instanceDir=xxx/ /
 core
   name=shard2_data
   collection=col2
   shard=shard2
   instanceDir=xx2/ /
  /cores
/solr

Now i get the following exception: ...couldn't find conf name for
collection1...
I don't have an collection 1. Why this exception?

---
You can see, there are too many exceptions and eventually
configuration problems with Tomcat and an external ZK.
Has anyone set up an identical configuration and does it work?
Does anyone detect mistakes in my configuration steps?

Best regards
Vadim


Re: whethere solr 3.3 index file is compatable with solr 4.0

2012-03-21 Thread Vadim Kisselmann
you have to re-index your data.

best regards
vadim


2012/3/21 syed kather in.ab...@gmail.com:
 Team

 I have indexed my data with solr 3.3 version , As I need to use
 hierarchical facets features from solr 4.0 .
 Can I use the existing data with Solr 4.0 version or should need to
 re-index the data with new version?



            Thanks and Regards,
        S SYED ABDUL KATHER


Solr 4.0 and tomcat, error in new admin UI

2012-03-15 Thread Vadim Kisselmann
Hi folks,

i comment this issue : https://issues.apache.org/jira/browse/SOLR-3238 ,
but i want to ask here if anyone have the same problem.


I use Solr 4.0 from trunk(latest) with tomcat6.

I get an error in New Admin UI:

This interface requires that you activate the admin request handlers,
add the following configuration to your solrconfig.xml:
!-- Admin Handlers - This will register all the standard admin
RequestHandlers. --
requestHandler name=/admin/ class=solr.admin.AdminHandlers /

Admin request Handlers are definitely activated in my solrconfig.

A problem with tomcat?
It works with embedded jetty, but i should use tomcat.

Best Regards
Vadim


Re: Apache Lucene Eurocon 2012

2012-03-08 Thread Vadim Kisselmann
Hi Chris,

thanks for your response.Ok, we will wait :)

Best Regards
Vadim




2012/3/8 Chris Hostetter hossman_luc...@fucit.org


 : where and when is the next Eurocon scheduled?
 : I read something about denmark and autumn 2012(i don't know where *g*).

 I do not know where, but sometime in the fall is probably the correct time
 frame.  I beleive the details will be announced at Lucene Revolution...

http://lucenerevolution.org/

 (that's what happened last year)

 -Hoss



Apache Lucene Eurocon 2012

2012-03-06 Thread Vadim Kisselmann
Hi folks,

where and when is the next Eurocon scheduled?
I read something about denmark and autumn 2012(i don't know where *g*).

Best regards and thanks
Vadim


Re: maxClauseCount Exception

2012-02-28 Thread Vadim Kisselmann
Set maxBooleanClauses in your solrconfig.xml higher, default is 1024.
Your query blast this limit.
Regards
Vadim



2012/2/22 Darren Govoni dar...@ontrenet.com

 Hi,
  I am suddenly getting a maxClauseCount exception for no reason. I am
 using Solr 3.5. I have only 206 documents in my index.

 Any ideas? This is wierd.

 QUERY PARAMS: [hl, hl.snippets, hl.simple.pre, hl.simple.post, fl,
 hl.mergeContiguous, hl.usePhraseHighlighter, hl.requireFieldMatch,
 echoParams, hl.fl, q, rows, start]|#]


 [#|2012-02-22T13:40:13.129-0500|INFO|glassfish3.1.1|
 org.apache.solr.core.SolrCore|_ThreadID=22;_ThreadName=Thread-2;|[]
 webapp=/solr3 path=/select
 params={hl=truehl.snippets=4hl.simple.pre=b/bfl=*,scorehl.mergeContiguous=truehl.usePhraseHighlighter=truehl.requireFieldMatch=trueechoParams=allhl.fl=text_tq={!lucene+q.op%3DOR+df%3Dtext_t}+(+kind_s:doc+OR+kind_s:xml)+AND+(type_s:[*+TO+*])+AND+(usergroup_sm:admin)rows=20start=0wt=javabinversion=2}
 hits=204 status=500 QTime=166 |#]


 [#|2012-02-22T13:40:13.131-0500|SEVERE|glassfish3.1.1|
 org.apache.solr.servlet.SolrDispatchFilter|
 _ThreadID=22;_ThreadName=Thread-2;|org.apache.lucene.search.BooleanQuery
 $TooManyClauses: maxClauseCount is set to 1024
at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:136)
at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:127)
at org.apache.lucene.search.ScoringRewrite
 $1.addClause(ScoringRewrite.java:51)
at org.apache.lucene.search.ScoringRewrite
 $1.addClause(ScoringRewrite.java:41)
at org.apache.lucene.search.ScoringRewrite
 $3.collect(ScoringRewrite.java:95)
at

 org.apache.lucene.search.TermCollectingRewrite.collectTerms(TermCollectingRewrite.java:38)
at
 org.apache.lucene.search.ScoringRewrite.rewrite(ScoringRewrite.java:93)
at
 org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:304)
at

 org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:158)
at

 org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:98)
at

 org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getWeightedSpanTerms(WeightedSpanTermExtractor.java:385)
at

 org.apache.lucene.search.highlight.QueryScorer.initExtractor(QueryScorer.java:217)
at
 org.apache.lucene.search.highlight.QueryScorer.init(QueryScorer.java:185)
at

 org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:205)
at

 org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:490)
at

 org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:401)
at

 org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:131)
at org.apache.so




Custom Query Component: parameters are not appended to query

2012-02-17 Thread Vadim Kisselmann
Hello folks,

I build a simple custom component for “hl.q” query.
My case was to inject hl.q=params on the fly,  with filter params like
fields which were in my
standard query.  These were highlighted , because Solr/Lucene have no way of
interpreting an extended q clause and saying this part is a query and
should be highlighted and
this part isn't.
If it works,  the community can have it :)

Facts:  q=roomba AND irobot AND language:de

My component is extended form SearchComponent. I use ResponseBuilder to get
all needed params
like field-names from schema, q-params, etc…



My component  is called as first (it works(debugging,debugQuery)) from my
SearchHandler:
arr name=first-components
strhighlightQuery/str
 /arr



Important Clippings from Sourcecode:

public class HighlightQueryComponent extends SearchComponent {

…….
…….

public void process(ResponseBuilder rb) throws IOException {

   if(rb.doHighlights){

ListString terms = new ArrayListString(0);
  SolrQueryRequest req = rb.req;
IndexSchema schema = req.getSchema();
MapString,SchemaField fields = schema.getFields();
  SolrParams params = req.getParams();
…..
….
…magic
…
….
Query hlq = new TermQuery(new Term(“text”, hlQuery.toString()));
rb.setHighlightQuery(hlq);   // hlq = text:(roomba AND irobot)



Problem:
In last step my query is adjusted (hlq params from debugging are
“text:(roomba AND irobot)”). It looks fine, the magic in method process()
works.
But nothing happen. If I continue to debug the next components were called,
But my query is the same, without changes.
Either setHighlightQuery doesn´t work, or my params are overridden in
following components.
What can it be?

Best Regards
Vadim


Re: How to reindex about 10Mio. docs

2012-02-09 Thread Vadim Kisselmann
Hi Otis,
thanks for your response:)
We had a solution yesterday. It works with an ruby script, curl and saxon/xslt.
The performance is great. We moved all the docs in 5-batches to
prevent an overload of our machines.
Best regards
Vadim



2012/2/8 Otis Gospodnetic otis_gospodne...@yahoo.com:
 Vadim,

 Would using xslt output help?

 Otis
 
 Performance Monitoring SaaS for Solr - 
 http://sematext.com/spm/solr-performance-monitoring/index.html




 From: Vadim Kisselmann v.kisselm...@googlemail.com
To: solr-user@lucene.apache.org
Sent: Wednesday, February 8, 2012 7:09 AM
Subject: Re: How to reindex about 10Mio. docs

Another problem appeared ;)
how can i export my docs in csv-format?
In Solr 3.1+ i can use the query-param wt=csv, but in Solr 1.4.1?
Best Regards
Vadim


2012/2/8 Vadim Kisselmann v.kisselm...@googlemail.com:
 Hi Ahmet,
 thanks for quick response:)
 I've already thought the same...
 And it will be a pain to export and import this huge doc-set as CSV.
 Do i have an another solution?
 Regards
 Vadim


 2012/2/8 Ahmet Arslan iori...@yahoo.com:
 i want to reindex about 10Mio. Docs. from one Solr(1.4.1) to
 another
 Solr(1.4.1).
 I changed my schema.xml (field types sing to slong),
 standard
 replication would fail.
 what is the fastest and smartest way to manage this?
 this here sound great (EntityProcessor):
 http://www.searchworkings.org/blog/-/blogs/importing-data-from-another-solr
 But would it work with Solr 1.4.1?

 SolrEntityProcessor is not available in 1.4.1. I would dump stored fields 
 into comma separated file, and use http://wiki.apache.org/solr/UpdateCSV 
 to feed into new solr instance.





How to reindex about 10Mio. docs

2012-02-08 Thread Vadim Kisselmann
Hello folks,

i want to reindex about 10Mio. Docs. from one Solr(1.4.1) to another
Solr(1.4.1).
I changed my schema.xml (field types sing to slong), standard
replication would fail.
what is the fastest and smartest way to manage this?
this here sound great (EntityProcessor):
http://www.searchworkings.org/blog/-/blogs/importing-data-from-another-solr
But would it work with Solr 1.4.1?

Best Regards
Vadim


Re: How to reindex about 10Mio. docs

2012-02-08 Thread Vadim Kisselmann
Hi Ahmet,
thanks for quick response:)
I've already thought the same...
And it will be a pain to export and import this huge doc-set as CSV.
Do i have an another solution?
Regards
Vadim


2012/2/8 Ahmet Arslan iori...@yahoo.com:
 i want to reindex about 10Mio. Docs. from one Solr(1.4.1) to
 another
 Solr(1.4.1).
 I changed my schema.xml (field types sing to slong),
 standard
 replication would fail.
 what is the fastest and smartest way to manage this?
 this here sound great (EntityProcessor):
 http://www.searchworkings.org/blog/-/blogs/importing-data-from-another-solr
 But would it work with Solr 1.4.1?

 SolrEntityProcessor is not available in 1.4.1. I would dump stored fields 
 into comma separated file, and use http://wiki.apache.org/solr/UpdateCSV to 
 feed into new solr instance.


Re: How to reindex about 10Mio. docs

2012-02-08 Thread Vadim Kisselmann
Another problem appeared ;)
how can i export my docs in csv-format?
In Solr 3.1+ i can use the query-param wt=csv, but in Solr 1.4.1?
Best Regards
Vadim


2012/2/8 Vadim Kisselmann v.kisselm...@googlemail.com:
 Hi Ahmet,
 thanks for quick response:)
 I've already thought the same...
 And it will be a pain to export and import this huge doc-set as CSV.
 Do i have an another solution?
 Regards
 Vadim


 2012/2/8 Ahmet Arslan iori...@yahoo.com:
 i want to reindex about 10Mio. Docs. from one Solr(1.4.1) to
 another
 Solr(1.4.1).
 I changed my schema.xml (field types sing to slong),
 standard
 replication would fail.
 what is the fastest and smartest way to manage this?
 this here sound great (EntityProcessor):
 http://www.searchworkings.org/blog/-/blogs/importing-data-from-another-solr
 But would it work with Solr 1.4.1?

 SolrEntityProcessor is not available in 1.4.1. I would dump stored fields 
 into comma separated file, and use http://wiki.apache.org/solr/UpdateCSV to 
 feed into new solr instance.


Re: Edismax, Filter Query and Highlighting

2012-02-01 Thread Vadim Kisselmann
hl.q works:)
But i have to attach the hl.q to my standard query.
In bigger queries it would by a pain to find out, which terms i need in my hl.q.
My plan: Own query parser in solr, which loops through q, identifies
filter terms(in my case language:de) and append
it as hl.q to the standard query. Sounds like a plan? :)
Best Regards
Vadim





2012/2/1 Koji Sekiguchi k...@r.email.ne.jp:
 (12/02/01 4:28), Vadim Kisselmann wrote:

 Hmm, i don´t know, but i can test it tomorrow at work.
 i´m not sure about the right syntax with hl.q. (?)
 but i report :)


 hl.q can accept same syntax of q, including local params.

 koji
 --
 http://www.rondhuit.com/en/


Edismax, Filter Query and Highlighting

2012-01-31 Thread Vadim Kisselmann
Hi,

i have problems with edismax, filter queries and highlighting.

First of all: can edismax deal with filter queries?

My case:
Edismax is my default requestHandler.
My query in SolrAdminGUI: (roomba OR irobot) AND language:de

You can see, that my q is roomba OR irobot and my fq is
language:de(language is a field in schema.xml)
 With this params i turn highlighting on: hl=truehl.fl=text,title,url

In my shown result you can see that highlighting matched on
emde/em in url(last arr).

lst 
name=de.blog-gedanken/produkte/erste-erfahrung-mit-unserem-roomba-roboter-staubsauger
arr name=titlestrErste Erfahrung mit unserem emRoomba/em
Roboter Staubsauger/str/arr
arr name=textstr
 Erste Erfahrung mit unserem emRoomba/em Roboter Staubsauger
 Tags: Haushaltshilfe, Roboter/str/arr
arr 
name=urlstrhttp://www.blog-gedanken.emde/em/produkte/erste-erfahrung-mit-unserem-emroomba/em-roboter-staubsauger//str/arr/lst

in calalina.out i can see the following query:
path=/select/ 
params={hl=trueversion=2.2indent=onrows=10start=0q=(roomba+OR+irobot)+AND+language:de}
hits=1 status=0 QTime=65

language:de is a filter, and shouldn't be highlighted.
Do i have a thinking error, or is my query wrong? Or is it an edismax problem?

Vest Regards
Vadim


Re: Edismax, Filter Query and Highlighting

2012-01-31 Thread Vadim Kisselmann
Hi Ahmet,

thanks for quick response :)
I've also discovered this failure.
I wonder that the query themselves works.
For example: query = language:de
I get results which only have language:de.
Also works the fq and i get only the de-result in my field language.
I can't understand the behavior. It seems like the fq works, but at
the end my fq-params be converted to q-params.

Regards
Vadim



2012/1/31 Ahmet Arslan iori...@yahoo.com:
 in calalina.out i can see the following query:
 path=/select/
 params={hl=trueversion=2.2indent=onrows=10start=0q=(roomba+OR+irobot)+AND+language:de}
 hits=1 status=0 QTime=65

 language:de is a filter, and shouldn't be highlighted.
 Do i have a thinking error, or is my query wrong? Or is it
 an edismax problem?

 In your example, language:de is a part of query. Use fq= instead.
 q=(roomba OR irobot)fq=language:de



Re: Edismax, Filter Query and Highlighting

2012-01-31 Thread Vadim Kisselmann
/doublelst
name=preparedouble name=time0.0/doublelst
name=org.apache.solr.handler.component.QueryComponentdouble
name=time0.0/double/lstlst
name=org.apache.solr.handler.component.FacetComponentdouble
name=time0.0/double/lstlst
name=org.apache.solr.handler.component.MoreLikeThisComponentdouble
name=time0.0/double/lstlst
name=org.apache.solr.handler.component.HighlightComponentdouble
name=time0.0/double/lstlst
name=org.apache.solr.handler.component.StatsComponentdouble
name=time0.0/double/lstlst
name=org.apache.solr.handler.component.DebugComponentdouble
name=time0.0/double/lst/lstlst name=processdouble
name=time15.0/doublelst
name=org.apache.solr.handler.component.QueryComponentdouble
name=time0.0/double/lstlst
name=org.apache.solr.handler.component.FacetComponentdouble
name=time0.0/double/lstlst
name=org.apache.solr.handler.component.MoreLikeThisComponentdouble
name=time0.0/double/lstlst
name=org.apache.solr.handler.component.HighlightComponentdouble
name=time8.0/double/lstlst
name=org.apache.solr.handler.component.StatsComponentdouble
name=time0.0/double/lstlst
name=org.apache.solr.handler.component.DebugComponentdouble
name=time7.0/double/lst/lst/lst/lst

I hope you can read it:)

Best Regards
Vadim





2012/1/31 Erick Erickson erickerick...@gmail.com:
 Seeing the results with debugQuery=on would help.

 No, fq does NOT get translated into q params, it's a
 completely separate mechanism so I'm not quite sure
 what you're seeing.

 Best
 Erick

 On Tue, Jan 31, 2012 at 8:40 AM, Vadim Kisselmann
 v.kisselm...@googlemail.com wrote:
 Hi Ahmet,

 thanks for quick response :)
 I've also discovered this failure.
 I wonder that the query themselves works.
 For example: query = language:de
 I get results which only have language:de.
 Also works the fq and i get only the de-result in my field language.
 I can't understand the behavior. It seems like the fq works, but at
 the end my fq-params be converted to q-params.

 Regards
 Vadim



 2012/1/31 Ahmet Arslan iori...@yahoo.com:
 in calalina.out i can see the following query:
 path=/select/
 params={hl=trueversion=2.2indent=onrows=10start=0q=(roomba+OR+irobot)+AND+language:de}
 hits=1 status=0 QTime=65

 language:de is a filter, and shouldn't be highlighted.
 Do i have a thinking error, or is my query wrong? Or is it
 an edismax problem?

 In your example, language:de is a part of query. Use fq= instead.
 q=(roomba OR irobot)fq=language:de



Re: Edismax, Filter Query and Highlighting

2012-01-31 Thread Vadim Kisselmann
Hi Erick,

 I didn't read your first post carefully enough, I was keying
 on the words filter query. Your query does not have
 any filter queries! I thought you were talking
 about fq=language:de type clauses, which is what
 I was responding to.

no problem, i understand:)

 Solr/Lucene have no way of
 interpreting an extended q clause and saying
 this part is a query and should be highlighted and
 this part isn't.

 Try the fq option maybe?

I thought so, unfortunately.
fq will be the only option. I should rebuild my application :)

Best Regards
Vadim


Re: Edismax, Filter Query and Highlighting

2012-01-31 Thread Vadim Kisselmann
Hmm, i don´t know, but i can test it tomorrow at work.
i´m not sure about the right syntax with hl.q. (?)
but i report :)




2012/1/31 Ahmet Arslan iori...@yahoo.com:
  Try the fq option maybe?

 I thought so, unfortunately.
 fq will be the only option. I should rebuild my
 application :)

 Could hl.q help? http://wiki.apache.org/solr/HighlightingParameters#hl.q


Re: Solr 3.5.0 can't find Carrot classes

2012-01-27 Thread Vadim Kisselmann
Hi Christopher,
when all needed jars are included, you can only have wrong paths in
your solrconfig.xml
Regards
Vadim



2012/1/26 Stanislaw Osinski stanislaw.osin...@carrotsearch.com:
 Hi,

 Can you paste the logs from the second run?

 Thanks,

 Staszek

 On Wed, Jan 25, 2012 at 00:12, Christopher J. Bottaro cjbott...@onespot.com
 wrote:

 On Tuesday, January 24, 2012 at 3:07 PM, Christopher J. Bottaro wrote:
  SEVERE: java.lang.NoClassDefFoundError:
 org/carrot2/core/ControllerFactory
          at
 org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine.init(CarrotClusteringEngine.java:102)
          at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
 Method)
          at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown
 Source)
          at
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
          at java.lang.reflect.Constructor.newInstance(Unknown Source)
          at java.lang.Class.newInstance0(Unknown Source)
          at java.lang.Class.newInstance(Unknown Source)
 
  …
 
  I'm starting Solr with -Dsolr.clustering.enabled=true and I can see that
 the Carrot jars in contrib are getting loaded.
 
  Full log file is here:
 http://onespot-development.s3.amazonaws.com/solr.log
 
  Any ideas?  Thanks for the help.
 
 Ok, got a little further.  Seems that Solr doesn't like it if you include
 jars more than once (I had a lib dir and also lib directives in the
 solrconfig which ended up loading the same jars twice).

 But now I'm getting these errors:  java.lang.NoClassDefFoundError:
 org/apache/solr/handler/clustering/SearchClusteringEngine

 Any help?  Thanks.


decreasing of maxFieldLength in solrconfig.xml doesn't work

2012-01-26 Thread Vadim Kisselmann
Hello Folks,
i want to decrease the max. number of terms for my fields to 500.
I thought what the maxFieldLength parameter in solrconfig.xml is
intended for this.
In my case it doesn't work.

The half of my text fields includes longer text(about 1 words).
With 100 docs in my index i had an segment size of 1140KB for indexed
data and 270KB for stored data (.fdx, .fdt).
After a change from default maxFieldLength1/maxFieldLength to
maxFieldLength500/maxFieldLength,
delete(index folder), restarting Tomcat and reindex, i see the same
segment sizes (1140KB for indexed and 270KB for stored data).

Please tell me if I made an error in reasoning.

Regards
Vadim


Re: decreasing of maxFieldLength in solrconfig.xml doesn't work

2012-01-26 Thread Vadim Kisselmann
P.S.:
i use Solr 4.0 from trunk.
Is maxFieldLength deprecated in Solr 4.0 ?
If so, do i have an alternative to decrease the number of terms during indexing?
Regards
Vadim



2012/1/26 Vadim Kisselmann v.kisselm...@googlemail.com:
 Hello Folks,
 i want to decrease the max. number of terms for my fields to 500.
 I thought what the maxFieldLength parameter in solrconfig.xml is
 intended for this.
 In my case it doesn't work.

 The half of my text fields includes longer text(about 1 words).
 With 100 docs in my index i had an segment size of 1140KB for indexed
 data and 270KB for stored data (.fdx, .fdt).
 After a change from default maxFieldLength1/maxFieldLength to
 maxFieldLength500/maxFieldLength,
 delete(index folder), restarting Tomcat and reindex, i see the same
 segment sizes (1140KB for indexed and 270KB for stored data).

 Please tell me if I made an error in reasoning.

 Regards
 Vadim


Re: decreasing of maxFieldLength in solrconfig.xml doesn't work

2012-01-26 Thread Vadim Kisselmann
Sean, Ahmet,
thanks for response:)

I use Solr 4.0 from trunk.
In my solrconfig.xml is only one maxFieldLength param.
I think it is deprecated in Solr Versions 3.5+...

But LimitTokenCountFilterFactory works in my case :)
Thanks!

Regards
Vadim



2012/1/26 Ahmet Arslan iori...@yahoo.com:
 i want to decrease the max. number of terms for my fields to
 500.
 I thought what the maxFieldLength parameter in
 solrconfig.xml is
 intended for this.
 In my case it doesn't work.

 The half of my text fields includes longer text(about 1
 words).
 With 100 docs in my index i had an segment size of 1140KB
 for indexed
 data and 270KB for stored data (.fdx, .fdt).
 After a change from default
 maxFieldLength1/maxFieldLength to
 maxFieldLength500/maxFieldLength,
 delete(index folder), restarting Tomcat and reindex, i see
 the same
 segment sizes (1140KB for indexed and 270KB for stored
 data).

 Please tell me if I made an error in reasoning.

 What version of solr are you using?

 Could it be 
 http://lucene.apache.org/solr/api/org/apache/solr/analysis/LimitTokenCountFilterFactory.html?

 http://lucene.apache.org/java/3_5_0/api/core/org/apache/lucene/analysis/LimitTokenCountFilter.html


Re: Size of index to use shard

2012-01-24 Thread Vadim Kisselmann
Hi,
it depends from your hardware.
Read this:
http://www.derivante.com/2009/05/05/solr-performance-benchmarks-single-vs-multi-core-index-shards/
Think about your cache-config (few updates, big caches) and a good
HW-infrastructure.
In my case i can handle a 250GB index with 100mil. docs on a I7
machine with RAID10 and 24GB RAM = q-times under 1 sec.
Regards
Vadim



2012/1/24 Anderson vasconcelos anderson.v...@gmail.com:
 Hi
 Has some size of index (or number of docs) that is necessary to break
 the index in shards?
 I have a index with 100GB of size. This index increase 10GB per year.
 (I don't have information how many docs they have) and the docs never
 will be deleted.  Thinking in 30 years, the index will be with 400GB
 of size.

 I think  is not required to break in shard, because i not consider
 this like a large index. Am I correct? What's is a real large
 index


 Thanks


Re: Size of index to use shard

2012-01-24 Thread Vadim Kisselmann
@Erick
thanks:)
i´m with you with your opinion.
my load tests show the same.

@Dmitry
my docs are small too, i think about 3-15KB per doc.
i update my index all the time and i have an average of 20-50 requests
per minute (20% facet queries, 80% large boolean queries with
wildcard/fuzzy) . How much docs at a time= depends from choosed
filters, from 10 to all 100Mio.
I work with very small caches (strangely, but if my index is under
100GB i need larger caches, over 100GB smaller caches..)
My JVM has 6GB, 18GB for I/O.
With few updates a day i would configure very big caches, like Tim
Burton (see HathiTrust´s Blog)

Regards Vadim



2012/1/24 Anderson vasconcelos anderson.v...@gmail.com:
 Thanks for the explanation Erick :)

 2012/1/24, Erick Erickson erickerick...@gmail.com:
 Talking about index size can be very misleading. Take
 a look at http://lucene.apache.org/java/3_5_0/fileformats.html#file-names.
 Note that the *.fdt and *.fdx files are used to for stored fields, i.e.
 the verbatim copy of data put in the index when you specify
 stored=true. These files have virtually no impact on search
 speed.

 So, if your *.fdx and *.fdt files are 90G out of a 100G index
 it is a much different thing than if these files are 10G out of
 a 100G index.

 And this doesn't even mention the peculiarities of your query mix.
 Nor does it say a thing about whether your cheapest alternative
 is to add more memory.

 Anderson's method is about the only reliable one, you just have
 to test with your index and real queries. At some point, you'll
 find your tipping point, typically when you come under memory
 pressure. And it's a balancing act between how much memory
 you allocate to the JVM and how much you leave for the op
 system.

 Bottom line: No hard and fast numbers. And you should periodically
 re-test the empirical numbers you *do* arrive at...

 Best
 Erick

 On Tue, Jan 24, 2012 at 5:31 AM, Anderson vasconcelos
 anderson.v...@gmail.com wrote:
 Apparently, not so easy to determine when to break the content into
 pieces. I'll investigate further about the amount of documents, the
 size of each document and what kind of search is being used. It seems,
 I will have to do a load test to identify the cutoff point to begin
 using the strategy of shards.

 Thanks

 2012/1/24, Dmitry Kan dmitry@gmail.com:
 Hi,

 The article you gave mentions 13GB of index size. It is quite small index
 from our perspective. We have noticed, that at least solr 3.4 has some
 sort
 of choking point with respect to growing index size. It just becomes
 substantially slower than what we need (a query on avg taking more than
 3-4
 seconds) once index size crosses a magic level (about 80GB following our
 practical observations). We try to keep our indices at around 60-70GB for
 fast searches and above 100GB for slow ones. We also route majority of
 user
 queries to fast indices. Yes, caching may help, but not necessarily we
 can
 afford adding more RAM for bigger indices. BTW, our documents are very
 small, thus in 100GB index we can have around 200 mil. documents. It
 would
 be interesting to see, how you manage to ensure q-times under 1 sec with
 an
 index of 250GB? How many documents / facets do you ask max. at a time?
 FYI,
 we ask for a thousand of facets in one go.

 Regards,
 Dmitry

 On Tue, Jan 24, 2012 at 10:30 AM, Vadim Kisselmann 
 v.kisselm...@googlemail.com wrote:

 Hi,
 it depends from your hardware.
 Read this:

 http://www.derivante.com/2009/05/05/solr-performance-benchmarks-single-vs-multi-core-index-shards/
 Think about your cache-config (few updates, big caches) and a good
 HW-infrastructure.
 In my case i can handle a 250GB index with 100mil. docs on a I7
 machine with RAID10 and 24GB RAM = q-times under 1 sec.
 Regards
 Vadim



 2012/1/24 Anderson vasconcelos anderson.v...@gmail.com:
  Hi
  Has some size of index (or number of docs) that is necessary to break
  the index in shards?
  I have a index with 100GB of size. This index increase 10GB per year.
  (I don't have information how many docs they have) and the docs never
  will be deleted.  Thinking in 30 years, the index will be with 400GB
  of size.
 
  I think  is not required to break in shard, because i not consider
  this like a large index. Am I correct? What's is a real large
  index
 
 
  Thanks





Size of fields from one document (monitoring, debugging)

2012-01-18 Thread Vadim Kisselmann
Hello folks,

is it possible to find out the size (in KB) of specific fields from
one document? Eventually with Luke or Lucid Gaze?
My case:
docs in my old index (Solr 1.4) have sizes of 3-4KB each.
In my new index(Solr 4.0 trunk) there are about 15KB per doc.
I changed only 2 things in my schema.xml. I added the
ReversedWildcardFilterFactory(indexing) and one field (LatLonType,
stored and indexed).
My content is more or less the same.
I would like to debug this to refactor my schema.xml.

The newest Luke Version(3.5) doesn't work with Solr 4.0 from trunk, so
i can't test it.

Cheers
Vadim


Re: Weird docs-id clustering output in Solr 1.4.1

2011-12-01 Thread Vadim Kisselmann
Hi Stanislaw,
did you already have time to create a patch?
If not, can you tell me please which lines in which class in source code
are relevant?
Thanks and regards
Vadim Kisselmann



2011/11/29 Vadim Kisselmann v.kisselm...@googlemail.com

 Hi,
 the quick and dirty way sound good:)
 It would be great if you can send me a patch for 1.4.1.


 By the way, i tested Solr. 3.5 with my 1.4.1 test index.
 I can search and optimize, but clustering doesn't work (java.lang.Integer
 cannot be cast to java.lang.String)
 My uniqieKey for my docs it the id(sint).
 These here was the error message:


 Problem accessing /solr/select/. Reason:

Carrot2 clustering failed

 org.apache.solr.common.SolrException: Carrot2 clustering failed
at
 org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine.cluster(CarrotClusteringEngine.java:217)
at
 org.apache.solr.handler.clustering.ClusteringComponent.process(ClusteringComponent.java:91)
at
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372)
at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at
 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
 Caused by: java.lang.ClassCastException: java.lang.Integer cannot be cast
 to java.lang.String
at
 org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine.getDocuments(CarrotClusteringEngine.java:364)
at
 org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine.cluster(CarrotClusteringEngine.java:201)
... 23 more

 It this case it's better for me to upgrade/patch the 1.4.1 version.

 Best regards
 Vadim




 2011/11/29 Stanislaw Osinski stanislaw.osin...@carrotsearch.com

 
  But my actual live system works on solr 1.4.1. i can only change my
  solrconfig.xml and integrate new packages...
  i check the possibility to upgrade from 1.4.1 to 3.5 with the same index
  (without reinidex) with luceneMatchVersion 2.9.
  i hope it works...
 

 Another option would be to check out Solr 1.4.1 source code, fix the issue
 and recompile the clustering component. The quick and dirty way would be
 to
 convert all identifiers to strings in the clustering component, before the
 they are returned for serialization (I can send you a patch that does
 this). The proper way would be to fix the root cause of the problem, but
 I'd need to dig deeper into the code to find this.

 Staszek





Re: Error in New Solr version

2011-12-01 Thread Vadim Kisselmann
Hi,
comment out the lines with the collapse component in your solrconfig.xml if
not need it.
otherwise, you're missing the right jar's for this component, or path's to
this jars in your solrconfig.xml are wrong.
regards
vadim



2011/12/1 Pawan Darira pawan.dar...@gmail.com

 Hi

 I am migrating from Solr 1.4 to Solr 3.2. I am getting below error in my
 logs

 org.apache.solr.common.SolrException: Error loading class
 'org.apache.solr.handler.component.CollapseComponent

 Could not found satisfactory solution on google. please help

 thanks
 Pawan



Re: Weird docs-id clustering output in Solr 1.4.1

2011-12-01 Thread Vadim Kisselmann
Hi Stanislaw,

unfortunately it doesn't work.
I changed the line 216 with the new toString()-part and rebuild the
source.
still the same behavior, without errors(because of changes).
an another line to change?

Thanks and regards
Vadim



2011/12/1 Stanislaw Osinski stanislaw.osin...@carrotsearch.com

 Hi Vadim,

 I've had limited connectivity, so I couldn't check out the complete 1.4.1
 code and test the changes. Here's what you can try:

 In this file:


 http://svn.apache.org/viewvc/lucene/solr/tags/release-1.4.1/contrib/clustering/src/main/java/org/apache/solr/handler/clustering/carrot2/CarrotClusteringEngine.java?revision=957515view=markup

 around line 216 you will see:

 for (Document doc : docs) {
  docList.add(doc.getField(solrId));
 }

 You need to change this to:

 for (Document doc : docs) {
  docList.add(doc.getField(solrId).toString());
 }

 Let me know if this did the trick.

 Cheers,

 S.

 On Thu, Dec 1, 2011 at 10:43, Vadim Kisselmann
 v.kisselm...@googlemail.comwrote:

  Hi Stanislaw,
  did you already have time to create a patch?
  If not, can you tell me please which lines in which class in source code
  are relevant?
  Thanks and regards
  Vadim Kisselmann
 
 
 
  2011/11/29 Vadim Kisselmann v.kisselm...@googlemail.com
 
   Hi,
   the quick and dirty way sound good:)
   It would be great if you can send me a patch for 1.4.1.
  
  
   By the way, i tested Solr. 3.5 with my 1.4.1 test index.
   I can search and optimize, but clustering doesn't work
 (java.lang.Integer
   cannot be cast to java.lang.String)
   My uniqieKey for my docs it the id(sint).
   These here was the error message:
  
  
   Problem accessing /solr/select/. Reason:
  
  Carrot2 clustering failed
  
   org.apache.solr.common.SolrException: Carrot2 clustering failed
  at
  
 
 org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine.cluster(CarrotClusteringEngine.java:217)
  at
  
 
 org.apache.solr.handler.clustering.ClusteringComponent.process(ClusteringComponent.java:91)
  at
  
 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
  at
  
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372)
  at
  
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
  at
  
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
  at
  
 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
  at
  
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
  at
  
 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
  at
  
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
  at
  
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
  at
  org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
  at
  
 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
  at
  
 
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
  at
  
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
  at org.mortbay.jetty.Server.handle(Server.java:326)
  at
   org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
  at
  
 
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
  at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
  at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
  at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
  at
  
 
 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
  at
  
 
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
   Caused by: java.lang.ClassCastException: java.lang.Integer cannot be
 cast
   to java.lang.String
  at
  
 
 org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine.getDocuments(CarrotClusteringEngine.java:364)
  at
  
 
 org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine.cluster(CarrotClusteringEngine.java:201)
  ... 23 more
  
   It this case it's better for me to upgrade/patch the 1.4.1 version.
  
   Best regards
   Vadim
  
  
  
  
   2011/11/29 Stanislaw Osinski stanislaw.osin...@carrotsearch.com
  
   
But my actual live system works on solr 1.4.1. i can only change my
solrconfig.xml and integrate new packages...
i check the possibility to upgrade from 1.4.1 to 3.5 with the same
  index
(without reinidex) with luceneMatchVersion 2.9.
i hope it works...
   
  
   Another option would be to check out Solr 1.4.1 source code, fix the
  issue
   and recompile the clustering component. The quick and dirty way would
 be
   to
   convert all identifiers to strings

Weird docs-id clustering output in Solr 1.4.1

2011-11-29 Thread Vadim Kisselmann
Hi folks,
i've installed the clustering component in solr 1.4.1 and it works, but not
really:)

You can see what the doc id is corrupt.

arr name=clusterslst
arr name=labels
strEuro-Krise/str
/arrarr name=docs
str½Íџ/str
str¾౥ͽ/str
str¿)ై/str
strˆ࡯׸/str
/arr/lst

my fields:
field name=id type=sint indexed=true stored=true required=true/
field name=url type=string indexed=false stored=true
required=true/
field name=title type=customtext indexed=false stored=true
required=true/
field name=text type=customtext indexed=false stored=true
multiValued=true compressed=true/

and my config-snippets:
str name=carrot.titletitle/str
 str name=carrot.urlid/str
 !-- The field to cluster on --
 str name=carrot.snippettext/str

i changed my config snippets (carrot.url=id, url, title..) but the
result is the same.
anyone an idea?

best regards and thanks
vadim


Re: Weird docs-id clustering output in Solr 1.4.1

2011-11-29 Thread Vadim Kisselmann
Hello Staszek,

thanks for testing:)
i think the same (serialization issue -int to string).
This config works fine with solr 4.0 in my test cluster, i think with 3,5
too, without problems.
But my actual live system works on solr 1.4.1. i can only change my
solrconfig.xml and integrate new packages...
i check the possibility to upgrade from 1.4.1 to 3.5 with the same index
(without reinidex) with luceneMatchVersion 2.9.
i hope it works...

Thanks and regards
Vadim


2011/11/29 Stanislaw Osinski stanis...@osinski.name

 Hi,

 It looks like some serialization issue related to writing integer ids to
 the output. I've just tried a similar configuration on Solr 3.5 and the
 integer identifiers looked fine. Can you try the same configuration on Solr
 3.5?

 Thanks,

 Staszek

 On Tue, Nov 29, 2011 at 12:03, Vadim Kisselmann 
 v.kisselm...@googlemail.com
  wrote:

  Hi folks,
  i've installed the clustering component in solr 1.4.1 and it works, but
 not
  really:)
 
  You can see what the doc id is corrupt.
 
  arr name=clusterslst
  arr name=labels
  strEuro-Krise/str
  /arrarr name=docs
  str½Íџ/str
  str¾౥ͽ/str
  str¿)ై/str
  strˆ࡯׸/str
  /arr/lst
 
  my fields:
  field name=id type=sint indexed=true stored=true
 required=true/
  field name=url type=string indexed=false stored=true
  required=true/
  field name=title type=customtext indexed=false stored=true
  required=true/
  field name=text type=customtext indexed=false stored=true
  multiValued=true compressed=true/
 
  and my config-snippets:
  str name=carrot.titletitle/str
   str name=carrot.urlid/str
   !-- The field to cluster on --
   str name=carrot.snippettext/str
 
  i changed my config snippets (carrot.url=id, url, title..) but the
  result is the same.
  anyone an idea?
 
  best regards and thanks
  vadim
 



Re: Weird docs-id clustering output in Solr 1.4.1

2011-11-29 Thread Vadim Kisselmann
Hi,
the quick and dirty way sound good:)
It would be great if you can send me a patch for 1.4.1.


By the way, i tested Solr. 3.5 with my 1.4.1 test index.
I can search and optimize, but clustering doesn't work (java.lang.Integer
cannot be cast to java.lang.String)
My uniqieKey for my docs it the id(sint).
These here was the error message:


Problem accessing /solr/select/. Reason:

   Carrot2 clustering failed

org.apache.solr.common.SolrException: Carrot2 clustering failed
   at
org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine.cluster(CarrotClusteringEngine.java:217)
   at
org.apache.solr.handler.clustering.ClusteringComponent.process(ClusteringComponent.java:91)
   at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
   at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372)
   at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
   at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
   at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
   at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
   at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
   at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
   at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
   at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
   at org.mortbay.jetty.Server.handle(Server.java:326)
   at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
   at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
   at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
   at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
   at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
   at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.lang.ClassCastException: java.lang.Integer cannot be cast
to java.lang.String
   at
org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine.getDocuments(CarrotClusteringEngine.java:364)
   at
org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine.cluster(CarrotClusteringEngine.java:201)
   ... 23 more

It this case it's better for me to upgrade/patch the 1.4.1 version.

Best regards
Vadim




2011/11/29 Stanislaw Osinski stanislaw.osin...@carrotsearch.com

 
  But my actual live system works on solr 1.4.1. i can only change my
  solrconfig.xml and integrate new packages...
  i check the possibility to upgrade from 1.4.1 to 3.5 with the same index
  (without reinidex) with luceneMatchVersion 2.9.
  i hope it works...
 

 Another option would be to check out Solr 1.4.1 source code, fix the issue
 and recompile the clustering component. The quick and dirty way would be to
 convert all identifiers to strings in the clustering component, before the
 they are returned for serialization (I can send you a patch that does
 this). The proper way would be to fix the root cause of the problem, but
 I'd need to dig deeper into the code to find this.

 Staszek



Re: how to : multicore setup with same config files

2011-11-23 Thread Vadim Kisselmann
Hi,
yes, see http://wiki.apache.org/solr/DistributedSearch
Regards
Vadim


2011/11/2 Val Minyaylo vminya...@centraldesktop.com

 Have you tried to query multiple cores at same time?


 On 10/31/2011 8:30 AM, Vadim Kisselmann wrote:

 it works.
 it was one wrong placed backslash in my config;)
 sharing the config/schema files is not a problem.
 regards vadim


 2011/10/31 Vadim 
 Kisselmannv.kisselmann@**googlemail.comv.kisselm...@googlemail.com
 

  Hi folks,

 i have a small blockade in the configuration of an multicore setup.
 i use the latest solr version (4.0) from trunk and the example (with
 jetty).
 single core is running without problems.

 We assume that i have this structure:

 /solr-trunk/solr/example/**multicore/

solr.xml

core0/

core1/


 /solr-data/

   /conf/

 schema.xml

 solrconfig.xml

   /data/

 core0/

   index

 core1/

   index


 I want so share the config-files(same instanceDir but different docDir)

 How can i configure this so that it works(solrconfig.xml, solr.xml)?

 Do i need the directories for core0/core1 in solr-trunk/...?


 I found issues in Jira with old patches which unfortunately doesn't work.


 Thanks and Regards

 Vadim









Re: InvalidTokenOffsetsException when using MappingCharFilterFactory, DictionaryCompoundWordTokenFilterFactory and Highlighting

2011-11-11 Thread Vadim Kisselmann
Hi Edwin, Chris

it´s an old bug. I have big problems too with OffsetExceptions when i use
Highlighting, or Carrot.
It looks like a problem with HTMLStripCharFilter.
Patch doesn´t work.

https://issues.apache.org/jira/browse/LUCENE-2208

Regards
Vadim



2011/11/11 Edwin Steiner edwin.stei...@gmail.com

 I just entered a bug: https://issues.apache.org/jira/browse/SOLR-2891

 Thanks  regards, Edwin

 On Nov 7, 2011, at 8:47 PM, Chris Hostetter wrote:

 
  : finally I want to use Solr highlighting. But there seems to be a
 problem
  : if I combine the char filter and the compound word filter in
 combination
  : with highlighting (an
  : org.apache.lucene.search.highlight.InvalidTokenOffsetsException is
  : raised).
 
  Definitely sounds like a bug somwhere in dealing with the offsets.
 
  can you please file a Jira, and include all of the data you have provided
  here?  it would also be helpful to know what the analysis tool says about
  the various attributes of your tokens at each stage of the analysis?
 
  : SEVERE: org.apache.solr.common.SolrException:
 org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token fall
 exceeds length of provided text sized 12
  : at
 org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:469)
  : at
 org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:378)
  : at
 org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:116)
  : at
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
  : at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
  : at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
  : at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
  : at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
  : at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
  : at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
  : at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
  : at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
  : at
 org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:462)
  : at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:164)
  : at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100)
  : at
 org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:851)
  : at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
  : at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:405)
  : at
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:278)
  : at
 org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:515)
  : at
 org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:302)
  : at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  : at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  : at java.lang.Thread.run(Thread.java:680)
  : Caused by:
 org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token fall
 exceeds length of provided text sized 12
  : at
 org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:228)
  : at
 org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:462)
  : ... 23 more
 
 
  -Hoss




Similar documents and advantages / disadvantages of MLT / Deduplication

2011-11-07 Thread Vadim Kisselmann
Hello folks,

i have questions about MLT and Deduplication and what would be the best
choice in my case.

Case:

I index 1000 docs, 5 of them are 95% the same (for example: copy pasted
blog articles from different sources, with slight changes (author name,
etc..)).
But they have differences.
*Now i like to see 1 doc in my result set and the other 4 should be marked
as similar.*

With *MLT*:
str name=mlt.fltext/str
  int name=mlt.minwl5/int
  int name=mlt.maxwl50/int
  int name=mlt.maxqt3/int
  int name=mlt.maxntp5000/int
  bool name=mlt.boosttrue/bool
  str name=mlt.qftext/str
   /lst

With this config i get about 500 similar docs for this 1 doc, unfortunately
too much.


*Deduplication*:
I index this docs now with an signature and i'm using TextProfileSignature.

updateRequestProcessorChain name=dedupe
   processor class=solr.processor.SignatureUpdateProcessorFactory
 bool name=enabledtrue/bool
 str name=signatureFieldsignature_t/str
 bool name=overwriteDupesfalse/bool
 str name=fieldstext/str
 str
name=signatureClasssolr.processor.TextProfileSignature/str
/processor
   processor class=solr.LogUpdateProcessorFactory /
   processor class=solr.RunUpdateProcessorFactory /
 /updateRequestProcessorChain

How can i compare the created signatures?


I want only see the 5 similar docs, nothing else.
Which of this two cases is relevant to me? Can i tune the MLT for my
requirement? Or should i use Dedupe?

Thanks and Regards
Vadim


shard indexing

2011-11-02 Thread Vadim Kisselmann
Hello folks,
i have an problem with shard indexing.

with an single core i use this update command:
http://localhost:8983/solr/update .

now i have 2 shards, we can call them core0 / core1
http://localhost:8983/solr/core0/update .


can i adjust anything to indexing in the same way like with a single core
without core-name?

thanks and regards
vadim


Re: shard indexing

2011-11-02 Thread Vadim Kisselmann
Hello Jan,

thanks for your quick response.

It's quite difficult to explain:
We want to create new shards on the fly every month and switch the default
shard to the newest one.
We always want to index to the newest shard with the same update query
like  http://localhost:8983/solr/update.(content stream)

Is our idea possible to implement?

Thanks in advance.
Regards

Vadim





2011/11/2 Jan Høydahl jan@cominvent.com

 Hi,

 The only difference is the core name in the URL, which should be easy
 enough to handle from your indexing client code. I don't really understand
 the reason behind your request. How would you control which core to index
 your document to if you did not specify it in the URL?

 You could name ONE of your cores as ., meaning it would be the default
 core living at /solr/update, perhaps that is what you're looking for?

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr Training - www.solrtraining.com

 On 2. nov. 2011, at 10:00, Vadim Kisselmann wrote:

  Hello folks,
  i have an problem with shard indexing.
 
  with an single core i use this update command:
  http://localhost:8983/solr/update .
 
  now i have 2 shards, we can call them core0 / core1
  http://localhost:8983/solr/core0/update .
 
 
  can i adjust anything to indexing in the same way like with a single core
  without core-name?
 
  thanks and regards
  vadim




Re: shard indexing

2011-11-02 Thread Vadim Kisselmann
Hello Yury,

thanks for your response.
This is exactly my plan. But defaultCoreName is buggy. When i use it
(defaultCore=core_november), the defaultCore will be deleted.
I think this here was the issue:
https://issues.apache.org/jira/browse/SOLR-2127

Do you use this feature and did it work?

Thanks and Regards
Vadim




2011/11/2 Yury Kats yuryk...@yahoo.com

 There's a defaultCore parameter in solr.xml that let's you specify what
 core should be used when none is specified in the URL. You can change that
 every time you create a new core.



 
 From: Vadim Kisselmann v.kisselm...@googlemail.com
 To: solr-user@lucene.apache.org
 Sent: Wednesday, November 2, 2011 6:16 AM
 Subject: Re: shard indexing
 
 Hello Jan,
 
 thanks for your quick response.
 
 It's quite difficult to explain:
 We want to create new shards on the fly every month and switch the default
 shard to the newest one.
 We always want to index to the newest shard with the same update query
 like  http://localhost:8983/solr/update.(content stream)
 
 Is our idea possible to implement?
 
 Thanks in advance.
 Regards
 
 Vadim
 
 
 
 
 
 2011/11/2 Jan Høydahl jan@cominvent.com
 
  Hi,
 
  The only difference is the core name in the URL, which should be easy
  enough to handle from your indexing client code. I don't really
 understand
  the reason behind your request. How would you control which core to
 index
  your document to if you did not specify it in the URL?
 
  You could name ONE of your cores as ., meaning it would be the
 default
  core living at /solr/update, perhaps that is what you're looking for?
 
  --
  Jan Høydahl, search solution architect
  Cominvent AS - www.cominvent.com
  Solr Training - www.solrtraining.com
 
  On 2. nov. 2011, at 10:00, Vadim Kisselmann wrote:
 
   Hello folks,
   i have an problem with shard indexing.
  
   with an single core i use this update command:
   http://localhost:8983/solr/update .
  
   now i have 2 shards, we can call them core0 / core1
   http://localhost:8983/solr/core0/update .
  
  
   can i adjust anything to indexing in the same way like with a single
 core
   without core-name?
  
   thanks and regards
   vadim
 
 
 
 
 



Re: shard indexing

2011-11-02 Thread Vadim Kisselmann
Hello Jan,

i think personally the same (switch URL for my indexing code), but my
requirement is to use the same query.
Thanks for your suppose with this one trick. Great idea which could work in
my case, i test it.

Regards
Vadim



2011/11/2 Jan Høydahl jan@cominvent.com

 Personally I think it is better to be explicit about where you index, so
 that when you create a new shard december, you also switch the URL for
 your indexing code.

 I suppose one trick you could use is to have a core called current,
 which now would be for november, and once you get to december, you create a
 november core, and do a SWAP between current-november. Then your
 new core would now be current and you don't need to change URLs on the
 index client side.

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr Training - www.solrtraining.com

 On 2. nov. 2011, at 11:16, Vadim Kisselmann wrote:

  Hello Jan,
 
  thanks for your quick response.
 
  It's quite difficult to explain:
  We want to create new shards on the fly every month and switch the
 default
  shard to the newest one.
  We always want to index to the newest shard with the same update query
  like  http://localhost:8983/solr/update.(content stream)
 
  Is our idea possible to implement?
 
  Thanks in advance.
  Regards
 
  Vadim
 
 
 
 
 
  2011/11/2 Jan Høydahl jan@cominvent.com
 
  Hi,
 
  The only difference is the core name in the URL, which should be easy
  enough to handle from your indexing client code. I don't really
 understand
  the reason behind your request. How would you control which core to
 index
  your document to if you did not specify it in the URL?
 
  You could name ONE of your cores as ., meaning it would be the
 default
  core living at /solr/update, perhaps that is what you're looking for?
 
  --
  Jan Høydahl, search solution architect
  Cominvent AS - www.cominvent.com
  Solr Training - www.solrtraining.com
 
  On 2. nov. 2011, at 10:00, Vadim Kisselmann wrote:
 
  Hello folks,
  i have an problem with shard indexing.
 
  with an single core i use this update command:
  http://localhost:8983/solr/update .
 
  now i have 2 shards, we can call them core0 / core1
  http://localhost:8983/solr/core0/update .
 
 
  can i adjust anything to indexing in the same way like with a single
 core
  without core-name?
 
  thanks and regards
  vadim
 
 




how to : multicore setup with same config files

2011-10-31 Thread Vadim Kisselmann
Hi folks,

i have a small blockade in the configuration of an multicore setup.
i use the latest solr version (4.0) from trunk and the example (with jetty).
single core is running without problems.

We assume that i have this structure:

/solr-trunk/solr/example/multicore/

   solr.xml

   core0/

   core1/


/solr-data/

  /conf/

schema.xml

solrconfig.xml

  /data/

core0/

  index

core1/

  index


I want so share the config-files(same instanceDir but different docDir)

How can i configure this so that it works(solrconfig.xml, solr.xml)?

Do i need the directories for core0/core1 in solr-trunk/...?


I found issues in Jira with old patches which unfortunately doesn't work.


Thanks and Regards

Vadim


Re: how to : multicore setup with same config files

2011-10-31 Thread Vadim Kisselmann
it works.
it was one wrong placed backslash in my config;)
sharing the config/schema files is not a problem.
regards vadim


2011/10/31 Vadim Kisselmann v.kisselm...@googlemail.com

 Hi folks,

 i have a small blockade in the configuration of an multicore setup.
 i use the latest solr version (4.0) from trunk and the example (with
 jetty).
 single core is running without problems.

 We assume that i have this structure:

 /solr-trunk/solr/example/multicore/

solr.xml

core0/

core1/


 /solr-data/

   /conf/

 schema.xml

 solrconfig.xml

   /data/

 core0/

   index

 core1/

   index


 I want so share the config-files(same instanceDir but different docDir)

 How can i configure this so that it works(solrconfig.xml, solr.xml)?

 Do i need the directories for core0/core1 in solr-trunk/...?


 I found issues in Jira with old patches which unfortunately doesn't work.


 Thanks and Regards

 Vadim








Re: LUCENE-2208 (SOLR-1883) Bug with HTMLStripCharFilter, given patch in next nightly build?

2011-10-21 Thread Vadim Kisselmann
UPDATE:
i checked out the latest trunk-version and patched this with the patch from
LUCENE-2208.
This patch seems not to work. Or i had done something wrong.

My old log snippets:

Http - 500 Internal Server Error
Error: Carrot2 clustering failed

And this was caused by:
Http - 500 Internal Server Error
Error: org.apache.lucene.search.highlight.InvalidTokenOffsetsException:
Token the exceeds length of provided text sized 41

Best Regards
Vadim





2011/10/20 Vadim Kisselmann v.kisselm...@googlemail.com

 Hello folks,

 i have big problems with InvalidTokenOffsetExceptions with highlighting.
 Looks like a bug in HTMLStripCharFilter.

 H.Wang added a patch in LUCENE-2208, but nobody have time to look at this.
 Could someone of the committers please take a look at this patch and commit
 it or is this problem more complicated as i think? :)
 Thanks guys...

 Best Regards
 Vadim





LUCENE-2208 (SOLR-1883) Bug with HTMLStripCharFilter, given patch in next nightly build?

2011-10-20 Thread Vadim Kisselmann
Hello folks,

i have big problems with InvalidTokenOffsetExceptions with highlighting.
Looks like a bug in HTMLStripCharFilter.

H.Wang added a patch in LUCENE-2208, but nobody have time to look at this.
Could someone of the committers please take a look at this patch and commit
it or is this problem more complicated as i think? :)
Thanks guys...

Best Regards
Vadim


Re: millions of records problem

2011-10-17 Thread Vadim Kisselmann
Hi,
a number of relevant questions is given.
i have another one:
which type of docs do you have? Do you add some new docs every day? Or is it
a stable number of docs (500Mio.) ?
What about Replication?

Regards Vadim


2011/10/17 Otis Gospodnetic otis_gospodne...@yahoo.com

 Hi Jesús,

 Others have already asked a number of relevant question.  If I had to
 guess, I'd guess this is simply a disk IO issue, but of course there may be
 room for improvement without getting more RAM or SSDs, so tell us more about
 your queries, about disk IO you are seeing, etc.

 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/


 
 From: Jesús Martín García jmar...@cesca.cat
 To: solr-user@lucene.apache.org
 Sent: Monday, October 17, 2011 6:19 AM
 Subject: millions of records problem
 
 Hi,
 
 I've got 500 millions of documents in solr everyone with the same number
 of fields an similar width. The version of solr which I used is 1.4.1 with
 lucene 2.9.3.
 
 I don't have the option to use shards so the whole index has to be in a
 machine...
 
 The size of the index is about 50Gb and the ram is 8GbEverything is
 working but the searches are so slowly, although I tried different
 configurations of the solrconfig.xml as:
 
 - configure a first searcher with the most used searches
 - configure the caches (query, filter and document) with great numbers...
 
 but everything is still working slowly, so do you have any ideas to boost
 the searches without the penalty to use much more ram?
 
 Thanks in advance,
 
 Jesús
 
 -- ...
   __
 /   /   Jesús Martín García
 C E / S / C A   Tècnic de Projectes
   /__ / Centre de Serveis Científics i Acadèmics de Catalunya
 
 Gran Capità, 2-4 (Edifici Nexus) · 08034 Barcelona
 T. 93 551 6213 · F. 93 205 6979 · jmar...@cesca.cat
 ...
 
 
 
 



Morelikethis understanding question

2011-10-14 Thread Vadim Kisselmann
Hello folks,
i have a question about the MLT.

For example my query:

localhost:8983/solr/mlt/?q=gefechtseinsatz+AND+dnamlt=truemlt.fl=textmlt.count=0mlt.boost=truemlt.mindf=5mlt.mintf=5mlt.minwl=4

*I have 1 Query-RESULT and 13 MLT-docs. The MLT-Result corresponds to
the half of my index.*
In my case i want j*ust this docs, which have at least half of the words
from my Query-RESULT-Document,* they should be very similar.
How should i set my parameters to achieve this?

Thanks and Regards
Vadim


Re: strange performance issue with many shards on one server

2011-09-28 Thread Vadim Kisselmann
Hi Fred,
analyze the queries which take longer.
We observe our queries and see the problems with q-time with queries which
are complex, with phrase queries or queries which contains numbers or
special characters.
if you don't know it:
http://www.hathitrust.org/blogs/large-scale-search/tuning-search-performance
Regards
Vadim


2011/9/28 Frederik Kraus frederik.kr...@gmail.com

  Hi,


 I am experiencing a strange issue doing some load tests. Our setup:

 - 2 server with each 24 cpu cores, 130GB of RAM
 - 10 shards per server (needed for response times) running in a single
 tomcat instance
 - each query queries all 20 shards (distributed search)

 - each shard holds about 1.5 mio documents (small shards are needed due to
 rather complex queries)
 - all caches are warmed / high cache hit rates (99%) etc.


 Now for some reason we cannot seem to fully utilize all CPU power (no disk
 IO), ie. increasing concurrent users doesn't increase CPU-Load at a point,
 decreases throughput and increases the response times of the individual
 queries.

 Also 1-2% of the queries take significantly longer: avg somewhere at 100ms
 while 1-2% take 1.5s or longer.

 Any ideas are greatly appreciated :)

 Fred.




Re: strange performance issue with many shards on one server

2011-09-28 Thread Vadim Kisselmann
Hi Fred,

ok, it's a strange behavior with same queries.
Another questions:
-which solr version?
-do you indexing during your load test? (because of index rebuilt)
-do you replicate your index?

Regards
Vadim



2011/9/28 Frederik Kraus frederik.kr...@gmail.com

 Hi Vladim,

 the thing is, that those exact same queries, that take longer during a load
 test, perform just fine when executed at a slower request rate and are also
 random, i.e. there is no pattern in bad/slow queries.

 My first thought was some kind of contention and/or connection starvation
 for the internal shard communication?

 Fred.


 Am Mittwoch, 28. September 2011 um 13:18 schrieb Vadim Kisselmann:

  Hi Fred,
  analyze the queries which take longer.
  We observe our queries and see the problems with q-time with queries
 which
  are complex, with phrase queries or queries which contains numbers or
  special characters.
  if you don't know it:
 
 http://www.hathitrust.org/blogs/large-scale-search/tuning-search-performance
  Regards
  Vadim
 
 
  2011/9/28 Frederik Kraus frederik.kr...@gmail.com (mailto:
 frederik.kr...@gmail.com)
 
Hi,
  
  
   I am experiencing a strange issue doing some load tests. Our setup:
  
   - 2 server with each 24 cpu cores, 130GB of RAM
   - 10 shards per server (needed for response times) running in a single
   tomcat instance
   - each query queries all 20 shards (distributed search)
  
   - each shard holds about 1.5 mio documents (small shards are needed due
 to
   rather complex queries)
   - all caches are warmed / high cache hit rates (99%) etc.
  
  
   Now for some reason we cannot seem to fully utilize all CPU power (no
 disk
   IO), ie. increasing concurrent users doesn't increase CPU-Load at a
 point,
   decreases throughput and increases the response times of the individual
   queries.
  
   Also 1-2% of the queries take significantly longer: avg somewhere at
 100ms
   while 1-2% take 1.5s or longer.
  
   Any ideas are greatly appreciated :)
  
   Fred.




Re: Still too many files after running solr optimization

2011-09-28 Thread Vadim Kisselmann
why should the optimization reduce the number of files?
It happens only when you indexing docs with same unique key.

Have you differences in numDocs und maxDocs after optimize?
If yes:
how is your optimize command ?

Regards
Vadim



2011/9/28 Manish Bafna manish.bafna...@gmail.com

 Try to do optimize twice.
 The 2nd one will be quick and will delete lot of files.

 On Wed, Sep 28, 2011 at 5:26 PM, Kissue Kissue kissue...@gmail.com
 wrote:
  Hi,
 
  I am using solr 3.3. I noticed  that after indexing about 700, 000
 records
  and running optimization at the end, i still have about 91 files in my
 index
  directory. I thought that optimization was supposed to reduce the number
 of
  files.
 
  My settings are the default that came with Solr (mergefactor, etc)
 
  Any ideas what i could be doing wrong?
 



Re: Still too many files after running solr optimization

2011-09-28 Thread Vadim Kisselmann
if numDocs und maxDocs have the same mumber of docs nothing will be deleted
on optimize.
You only rebuild your index.

Regards
Vadim




2011/9/28 Kissue Kissue kissue...@gmail.com

 numDocs and maxDocs are same size.

 I was worried because when i used to use only Lucene for the same indexing,
 before optimization there are many files but after optimization i always
 end
 up with just 3 files in my index filder. Just want to find out if this was
 ok.

 Thanks

 On Wed, Sep 28, 2011 at 1:23 PM, Vadim Kisselmann 
 v.kisselm...@googlemail.com wrote:

  why should the optimization reduce the number of files?
  It happens only when you indexing docs with same unique key.
 
  Have you differences in numDocs und maxDocs after optimize?
  If yes:
  how is your optimize command ?
 
  Regards
  Vadim
 
 
 
  2011/9/28 Manish Bafna manish.bafna...@gmail.com
 
   Try to do optimize twice.
   The 2nd one will be quick and will delete lot of files.
  
   On Wed, Sep 28, 2011 at 5:26 PM, Kissue Kissue kissue...@gmail.com
   wrote:
Hi,
   
I am using solr 3.3. I noticed  that after indexing about 700, 000
   records
and running optimization at the end, i still have about 91 files in
 my
   index
directory. I thought that optimization was supposed to reduce the
  number
   of
files.
   
My settings are the default that came with Solr (mergefactor, etc)
   
Any ideas what i could be doing wrong?
   
  
 



Re: strange performance issue with many shards on one server

2011-09-28 Thread Vadim Kisselmann
)
 at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
 at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
 at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at
 org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:554)
 at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
 at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
 at
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
 at
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
 at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
 at java.lang.Thread.run(Thread.java:662)






 Am Mittwoch, 28. September 2011 um 13:53 schrieb Frederik Kraus:

 
 
  Am Mittwoch, 28. September 2011 um 13:41 schrieb Vadim Kisselmann:
 
   Hi Fred,
  
   ok, it's a strange behavior with same queries.
   Another questions:
   -which solr version?
 
  3.3 (might the NIOFSDirectory from 3.4 help?)
 
   -do you indexing during your load test? (because of index rebuilt)
  nope
 
   -do you replicate your index?
 
  nope
  
   Regards
   Vadim
  
  
  
   2011/9/28 Frederik Kraus frederik.kr...@gmail.com (mailto:
 frederik.kr...@gmail.com)
  
Hi Vladim,
   
the thing is, that those exact same queries, that take longer during
 a load
test, perform just fine when executed at a slower request rate and
 are also
random, i.e. there is no pattern in bad/slow queries.
   
My first thought was some kind of contention and/or connection
 starvation
for the internal shard communication?
   
Fred.
   
   
Am Mittwoch, 28. September 2011 um 13:18 schrieb Vadim Kisselmann:
   
 Hi Fred,
 analyze the queries which take longer.
 We observe our queries and see the problems with q-time with
 queries
which
 are complex, with phrase queries or queries which contains numbers
 or
 special characters.
 if you don't know it:
   
 http://www.hathitrust.org/blogs/large-scale-search/tuning-search-performance
 Regards
 Vadim


 2011/9/28 Frederik Kraus frederik.kr...@gmail.com (mailto:
 frederik.kr...@gmail.com) (mailto:
frederik.kr...@gmail.com (mailto:frederik.kr...@gmail.com))

   Hi,
 
 
  I am experiencing a strange issue doing some load tests. Our
 setup:
 
  - 2 server with each 24 cpu cores, 130GB of RAM
  - 10 shards per server (needed for response times) running in a
 single
  tomcat instance
  - each query queries all 20 shards (distributed search)
 
  - each shard holds about 1.5 mio documents (small shards are
 needed due
to
  rather complex queries)
  - all caches are warmed / high cache hit rates (99%) etc.
 
 
  Now for some reason we cannot seem to fully utilize all CPU power
 (no
disk
  IO), ie. increasing concurrent users doesn't increase CPU-Load at
 a
point,
  decreases throughput and increases the response times of the
 individual
  queries.
 
  Also 1-2% of the queries take significantly longer: avg somewhere
 at
100ms
  while 1-2% take 1.5s or longer.
 
  Any ideas are greatly appreciated :)
 
  Fred.




Re: Still too many files after running solr optimization

2011-09-28 Thread Vadim Kisselmann
2011/9/28 Manish Bafna manish.bafna...@gmail.com

 Will it not merge the index?


yes


 While merging on windows, the old index files dont get deleted.
 (Windows has an issue where the file opened for reading cannot be
 deleted)
 
 So, if you call optimize again, it will delete the older index files.

 no.
during optimize you only delete docs, which are flagged as deleted. no
matter how old they are.
if your numDocs and maxDocs have the same number of Docs, you only rebuild
and merge your index, but you delete nothing.

Regards




 On Wed, Sep 28, 2011 at 6:43 PM, Vadim Kisselmann
 v.kisselm...@googlemail.com wrote:
  if numDocs und maxDocs have the same mumber of docs nothing will be
 deleted
  on optimize.
  You only rebuild your index.
 
  Regards
  Vadim
 
 
 
 
  2011/9/28 Kissue Kissue kissue...@gmail.com
 
  numDocs and maxDocs are same size.
 
  I was worried because when i used to use only Lucene for the same
 indexing,
  before optimization there are many files but after optimization i always
  end
  up with just 3 files in my index filder. Just want to find out if this
 was
  ok.
 
  Thanks
 
  On Wed, Sep 28, 2011 at 1:23 PM, Vadim Kisselmann 
  v.kisselm...@googlemail.com wrote:
 
   why should the optimization reduce the number of files?
   It happens only when you indexing docs with same unique key.
  
   Have you differences in numDocs und maxDocs after optimize?
   If yes:
   how is your optimize command ?
  
   Regards
   Vadim
  
  
  
   2011/9/28 Manish Bafna manish.bafna...@gmail.com
  
Try to do optimize twice.
The 2nd one will be quick and will delete lot of files.
   
On Wed, Sep 28, 2011 at 5:26 PM, Kissue Kissue kissue...@gmail.com
 
wrote:
 Hi,

 I am using solr 3.3. I noticed  that after indexing about 700, 000
records
 and running optimization at the end, i still have about 91 files
 in
  my
index
 directory. I thought that optimization was supposed to reduce the
   number
of
 files.

 My settings are the default that came with Solr (mergefactor, etc)

 Any ideas what i could be doing wrong?

   
  
 
 



Re: Still too many files after running solr optimization

2011-09-28 Thread Vadim Kisselmann
we had an understanding problem:)

docs are the docs in index.
files are the files in the index directory (index parts).

during the optimization you don't delete docs if they are don't flagged as
deleted.
but you merge your index und delete the files in your index directory, thats
right.

after an second optimize the files are deleted which were opened for
reading.

Regards



2011/9/28 Manish Bafna manish.bafna...@gmail.com

 We tested it so many times.
 1st time we optimize, the new index file is created (merged one), but
 the existing index files are not deleted (because they might be still
 open for reading)
 2nd time optimize, other than the new index file, all else gets deleted.

 This is happening specifically on Windows.

 On Wed, Sep 28, 2011 at 8:23 PM, Vadim Kisselmann
 v.kisselm...@googlemail.com wrote:
  2011/9/28 Manish Bafna manish.bafna...@gmail.com
 
  Will it not merge the index?
 
 
  yes
 
 
  While merging on windows, the old index files dont get deleted.
  (Windows has an issue where the file opened for reading cannot be
  deleted)
  
  So, if you call optimize again, it will delete the older index files.
 
  no.
  during optimize you only delete docs, which are flagged as deleted. no
  matter how old they are.
  if your numDocs and maxDocs have the same number of Docs, you only
 rebuild
  and merge your index, but you delete nothing.
 
  Regards
 
 
 
 
  On Wed, Sep 28, 2011 at 6:43 PM, Vadim Kisselmann
  v.kisselm...@googlemail.com wrote:
   if numDocs und maxDocs have the same mumber of docs nothing will be
  deleted
   on optimize.
   You only rebuild your index.
  
   Regards
   Vadim
  
  
  
  
   2011/9/28 Kissue Kissue kissue...@gmail.com
  
   numDocs and maxDocs are same size.
  
   I was worried because when i used to use only Lucene for the same
  indexing,
   before optimization there are many files but after optimization i
 always
   end
   up with just 3 files in my index filder. Just want to find out if
 this
  was
   ok.
  
   Thanks
  
   On Wed, Sep 28, 2011 at 1:23 PM, Vadim Kisselmann 
   v.kisselm...@googlemail.com wrote:
  
why should the optimization reduce the number of files?
It happens only when you indexing docs with same unique key.
   
Have you differences in numDocs und maxDocs after optimize?
If yes:
how is your optimize command ?
   
Regards
Vadim
   
   
   
2011/9/28 Manish Bafna manish.bafna...@gmail.com
   
 Try to do optimize twice.
 The 2nd one will be quick and will delete lot of files.

 On Wed, Sep 28, 2011 at 5:26 PM, Kissue Kissue 
 kissue...@gmail.com
  
 wrote:
  Hi,
 
  I am using solr 3.3. I noticed  that after indexing about 700,
 000
 records
  and running optimization at the end, i still have about 91
 files
  in
   my
 index
  directory. I thought that optimization was supposed to reduce
 the
number
 of
  files.
 
  My settings are the default that came with Solr (mergefactor,
 etc)
 
  Any ideas what i could be doing wrong?
 

   
  
  
 
 



Re: NRT and commit behavior

2011-09-26 Thread Vadim Kisselmann
Tirthankar,

are you indexing 1.smaller docs or 2.books?
if 1.  your caches are too big for your memory, as Erick already said.
Try to allocate 10GB für JVM, leave 14GB for your HDD-Cache and make your
caches smaller.

if 2.  read the blog-posts on hathitrust.com.
http://www.hathitrust.org/blogs/large-scale-search

Regards
Vadim


2011/9/24 Erick Erickson erickerick...@gmail.com

 No G. The problem is that number of documents isn't a reliable
 indicator of resource consumption. Consider the difference between
 indexing a twitter message and a book. I can put a LOT more docs
 of 140 chars on a single machine of size X than I can books.

 Unfortunately, the only way I know of is to test. Use something like
 jMeter of SolrMeter to fire enough queries at your machine to
 determine when you're over-straining resources and shard at that
 point (or get a bigger machine G)..

 Best
 Erick

 On Wed, Sep 21, 2011 at 8:24 PM, Tirthankar Chatterjee
 tchatter...@commvault.com wrote:
  Okay, but is there any number that if we reach on the index size or total
 docs in the index or the size of physical memory that sharding should be
 considered.
 
  I am trying to find the winning combination.
  Tirthankar
  -Original Message-
  From: Erick Erickson [mailto:erickerick...@gmail.com]
  Sent: Friday, September 16, 2011 7:46 AM
  To: solr-user@lucene.apache.org
  Subject: Re: NRT and commit behavior
 
  Uhm, you're putting  a lot of index into not very much memory. I really
 think you're going to have to shard your index across several machines to
 get past this problem. Simply increasing the size of your caches is still
 limited by the physical memory you're working with.
 
  You really have to put a profiler on the system to see what's going on.
 At that size there are too many things that it *could* be to definitively
 answer it with e-mails
 
  Best
  Erick
 
  On Wed, Sep 14, 2011 at 7:35 AM, Tirthankar Chatterjee 
 tchatter...@commvault.com wrote:
  Erick,
  Also, we had  our solrconfig where we have tried increasing the
 cache making the below value for autowarm count as 0 helps returning the
 commit call within the second, but that will slow us down on searches
 
  filterCache
   class=solr.FastLRUCache
   size=16384
   initialSize=4096
   autowarmCount=4096/
 
 !-- Cache used to hold field values that are quickly accessible
  by document id.  The fieldValueCache is created by default
  even if not configured here.
   fieldValueCache
 class=solr.FastLRUCache
 size=512
 autowarmCount=128
 showItems=32
   /
 --
 
!-- queryResultCache caches results of searches - ordered lists of
  document ids (DocList) based on a query, a sort, and the range
  of documents requested.  --
 queryResultCache
   class=solr.LRUCache
   size=16384
   initialSize=4096
   autowarmCount=4096/
 
   !-- documentCache caches Lucene Document objects (the stored fields
 for each document).
Since Lucene internal document ids are transient, this cache
  will not be autowarmed.  --
 documentCache
   class=solr.LRUCache
   size=512
   initialSize=512
   autowarmCount=512/
 
  -Original Message-
  From: Tirthankar Chatterjee [mailto:tchatter...@commvault.com]
  Sent: Wednesday, September 14, 2011 7:31 AM
  To: solr-user@lucene.apache.org
  Subject: RE: NRT and commit behavior
 
  Erick,
  Here is the answer to your questions:
  Our index is 267 GB
  We are not optimizing...
  No we have not profiled yet to check the bottleneck, but logs indicate
 opening the searchers is taking time...
  Nothing except SOLR
  Total memory is 16GB tomcat has 8GB allocated Everything 64 bit OS and
  JVM and Tomcat
 
  -Original Message-
  From: Erick Erickson [mailto:erickerick...@gmail.com]
  Sent: Sunday, September 11, 2011 11:37 AM
  To: solr-user@lucene.apache.org
  Subject: Re: NRT and commit behavior
 
  Hmm, OK. You might want to look at the non-cached filter query stuff,
 it's quite recent.
  The point here is that it is a filter that is applied only after all of
 the less expensive filter queries are run, One of its uses is exactly ACL
 calculations. Rather than calculate the ACL for the entire doc set, it only
 calculates access for docs that have made it past all the other elements of
 the query See SOLR-2429 and note that it is a 3.4 (currently being
 released) only.
 
  As to why your commits are taking so long, I have no idea given that you
 really haven't given us much to work with.
 
  How big is your index? Are you optimizing? Have you profiled the
 application to see what the bottleneck is (I/O, CPU, etc?). What else is
 running on your machine? It's quite surprising that it takes that long. How
 much memory are you giving the JVM? etc...
 
  You might want to review:
  http://wiki.apache.org/solr/UsingMailingLists
 
  Best
  Erick
 
 
  On Fri, Sep 9, 2011 at 9:41 AM, 

Last successful build of Solr 4.0 and Near Realtime Search

2011-08-12 Thread Vadim Kisselmann
Hi folks,

I'm writing here again (beside Jira: SOLR-2565), eventually any one can help
here:


I tested the nightly build #1595 with an new patch (2565), but NRT doesn't
work in my case.

I index 10 docs/sec, it takes 1-30sec. to see the results.
same behavior when i update an existing document.

My addedDate is an timestamp (default=NOW). In worst case i can see what
the document which i indexed is already more when 30
seconds in my index, but i can't see it.

My Settings:
autoCommit
maxDocs1000/maxDocs
maxTime6/maxTime
/autoCommit

autoSoftCommit
maxDocs1/maxDocs
maxTime1000/maxTime
/autoSoftCommit

Are my settings wrong or need you more details?
Should i use the coldSearcher (default=false)? Or set maxWarmingSearchers
higher than 2?
UPDATE:
If i only use autoSoftCommit and uncomment autoCommit it works.
But i should use the hard autoCommit, right?
Mark said yes, because only with hard commits my docs are in stable storage:
http://www.lucidimagination.com/blog/2011/07/11/benchmarking-the-new-solr-
‘near-realtime’-improvements/

Regards
Vadim


Re: Unbuffered entity enclosing request can not be repeated Invalid chunk header

2011-08-12 Thread Vadim Kisselmann
Hi Markus,

thanks for your answer.
I'm using Solr. 4.0 and jetty now and observe the behavior and my error logs
next week.
tomcat can be a reason, we will see, i'll report.

I'm indexing WITHOUT batches, one doc after another. But i would try out the
batch indexing as well as
retry indexing faulty docs.
if you indexing one batch, and one doc in batch is corrupt, what happens
with another 249docs(total 250/batch)? Are they indexed and
updated when you retry to indexing the batch, or fails the complete batch?

Regards
Vadim




2011/8/11 Markus Jelsma markus.jel...@openindex.io

 Hi,

 We  see these errors too once on a while but there is real answer on the
 mailing list here except one user suspecting Tomcat is responsible
 (connection
 time outs).

 Another user proposed to limit the number of documents per batch but that,
 of
 course, increases the number of connections made. We do only 250 docs/batch
 to
 limit RAM usage on the client and start to see these errors very
 occasionally.
 There may be a coincidence.. or not.

 Anyway, it's really hard to reproduce if not impossible. It happens when
 connecting directly as well when connecting through a proxy.

 What you can do is simply retry the batch and it usually works out fine. At
 least you don't loose a batch in the process. We retry all failures at
 least a
 couple of times before giving up an indexing job.

 Cheers,

  Hello folks,
 
  i use solr 1.4.1 and every 2 to 6 hours i have indexing errors in my log
  files.
 
  on the client side:
  2011-08-04 12:01:18,966 ERROR [Worker-242] IndexServiceImpl - Indexing
  failed with SolrServerException.
  Details: org.apache.commons.httpclient.ProtocolException: Unbuffered
 entity
  enclosing request can not be repeated.:
  Stacktrace:
 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHtt
  pSolrServer.java:469) .
  .
  on the server side:
  INFO: [] webapp=/solr path=/update params={wt=javabinversion=1} status=0
  QTime=3
  04.08.2011 12:01:18 org.apache.solr.update.processor.LogUpdateProcessor
  finish
  INFO: {} 0 0
  04.08.2011 12:01:18 org.apache.solr.common.SolrException log
  SCHWERWIEGEND: org.apache.solr.common.SolrException: java.io.IOException:
  Invalid chunk header
  .
  .
  .
  i`m indexing ONE document per call, 15-20 documents per second, 24/7.
  what may be the problem?
 
  best regards
  vadim



Unbuffered entity enclosing request can not be repeated Invalid chunk header

2011-08-04 Thread Vadim Kisselmann
Hello folks,

i use solr 1.4.1 and every 2 to 6 hours i have indexing errors in my log
files.

on the client side:
2011-08-04 12:01:18,966 ERROR [Worker-242] IndexServiceImpl - Indexing
failed with SolrServerException.
Details: org.apache.commons.httpclient.ProtocolException: Unbuffered entity
enclosing request can not be repeated.:
Stacktrace: 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:469)
.
.
on the server side:
INFO: [] webapp=/solr path=/update params={wt=javabinversion=1} status=0
QTime=3
04.08.2011 12:01:18 org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: {} 0 0
04.08.2011 12:01:18 org.apache.solr.common.SolrException log
SCHWERWIEGEND: org.apache.solr.common.SolrException: java.io.IOException:
Invalid chunk header
.
.
.
i`m indexing ONE document per call, 15-20 documents per second, 24/7.
what may be the problem?

best regards
vadim


Re: Replication slows down massively during high load

2011-03-17 Thread Vadim Kisselmann
Hello Shawn,

Primary assumption:  You have a 64-bit OS and a 64-bit JVM.

Jepp, it's running 64-bit Linux with 64-bit JVM

It sounds to me like you're I/O bound, because your machine cannot
keep enough of your index in RAM.  Relative to your 100GB index, you
only have a maximum of 14GB of RAM available to the OS disk cache,
since Java's heap size is 10GB.

The load test seems to be more CPU bound than I/O bound. 
All cores are fully busy and iostat says that there isn't 
much more disk I/O going on than without load test. The 
index is on a RAID10 array with four disks.

How much disk space do all of the index files that end in x take up?
 I would venture a guess that it's significantly more than 14GB.  On
Linux, you could do this command to tally it quickly:

# du -hc *x

27G total

# du -hc `ls | egrep -v tvf|fdt`

51G total

If you installed enough RAM so the disk cache can be much larger than
the total size of those files ending in x, you'd probably stop
having these performance issues.  Realizing that this is a
Alternatively, you could take steps to reduce the size of your index,
or perhaps add more machines to go distributed.

Unfortunately, this doesn't seem to be the problem. 
The queries themselves are running fine. The problem 
is that the replications is crawling when there are 
many queries going on and that the replication speed 
stays low even after the load is gone.



Cheers
Vadim


Re: Replication slows down massively during high load

2011-03-17 Thread Vadim Kisselmann
On Mar 17, 2011, at 3:19 PM, Shawn Heisey wrote:

On 3/17/2011 3:43 AM, Vadim Kisselmann wrote:
Unfortunately, this doesn't seem to be the problem. The queries
themselves are running fine. The problem is that the replications is
crawling when there are many queries going on and that the replication
speed stays low even after the load is gone.

If you run iostat 5 what are typical values on each iteration for
the various CPU states while you're doing load testing and replication
at the same time?  In particular, %iowait is important.



CPU stats from top (iostat doesn't seem to show CPU load correctly):

90.1%us,  4.5%sy,  0.0%ni,  5.5%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

Seems like I/O is not the bottleneck here.

Other interesting thing: When Solr starts its replication under heavy
load, it tries to download the whole index from master.

From /solr/admin/replication/index.jsp:

Current Replication Status

Start Time: Thu Mar 17 15:57:20 CET 2011
Files Downloaded: 9 / 163
Downloaded: 83,04 MB / 97,75 GB [0.0%]
Downloading File: _d5x.nrm, Downloaded: 86,82 KB / 86,82 KB [100.0%]
Time Elapsed: 419s, Estimated Time Remaining: 504635s, Speed: 202,94 
KB/s


Re: Replication slows down massively during high load

2011-03-17 Thread Vadim Kisselmann
Hi Bill,

 You could always rsync the index dir and reload (old scripts).

I used them previously but was getting problems with them. The
application querying the Solr doesn't cause enough load on it to
trigger the issue. Yet.

 But this is still something we should investigate.

Indeed :-)

 See if the Nic is configured right? Routing? Speed of transfer?

Network doesn't seem to be the problem. Testing with iperf from slave
to master yields a full gigabit, even while Solrmeter is hammering the
server.

 Bill Bell

Vadim


Replication slows down massively during high load

2011-03-16 Thread Vadim Kisselmann
Hi everyone,

I have Solr running on one master and two slaves (load balanced) via
Solr 1.4.1 native replication.

If the load is low, both slaves replicate with around 100MB/s from master.

But when I use Solrmeter (100-400 queries/min) for load tests (over
the load balancer), the replication slows down to an unacceptable
speed, around 100KB/s (at least that's whats the replication page on
/solr/admin says).

Going to a slave directly without load balancer yields the same result
for the slave under test:

Slave 1 gets hammered with Solrmeter and the replication slows down to 100KB/s.
At the same time, Slave 2 with only 20-50 queries/min without the load
test has no problems. It replicates with 100MB/s and the index version
is 5-10 versions ahead of Slave 1.

The replications stays in the 100KB/s range even after the load test
is over until the application server is restarted. The same issue
comes up under both Tomcat and Jetty.

The setup looks like this:

- Same hardware for all servers: Physical machines with quad core
CPUs, 24GB RAM (JVM starts up with -XX:+UseConcMarkSweepGC -Xms10G
-Xmx10G)
- Index size is about 100GB with 40M docs
- Master commits every 10 min/10k docs
- Slaves polls every minute

I checked this:

- Changed network interface; same behavior
- Increased thread pool size from 200 to 500 and queue size from 100
to 500 in Tomcat; same behavior
- Both disk and network I/O are not bottlenecked. Disk I/O went down
to almost zero after every query in the load test got cached. Network
isn't doing much and can put through almost an GBit/s with iPerf
(network throughput tester) while Solrmeter is running.

Any ideas what could be wrong?


Best Regards
Vadim