Re: solr nutch url indexing
Uri Boness wrote: Well... yes, it's a tool the Nutch ships with. It also ships with an example Solr schema which you can use. hi, is there any documentation to understand what going in the schema ? requestHandler name=/nutch class=solr.SearchHandler lst name=defaults str name=defTypedismax/str str name=echoParamsexplicit/str float name=tie0.01/float str name=qfcontent0.5 anchor1.0 title5.2/str str name=pfcontent0.5 anchor1.5 title5.2 site1.5/str str name=flurl/str str name=mm2lt;-1 5lt;-2 6lt;90%/str int name=ps100/int bool hl=true/ str name=q.alt*:*/str str name=hl.fltitle url content/str str name=f.title.hl.fragsize0/str str name=f.title.hl.alternateFieldtitle/str str name=f.url.hl.fragsize0/str str name=f.url.hl.alternateFieldurl/str str name=f.content.hl.fragmenterregex/str /lst /requestHandler
RE: encoding problem
Hi Shalin, stupid question - I'm an apache/solr newbie - but how do I access the JVM??? Regards Bern -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: Wednesday, 26 August 2009 5:10 PM To: solr-user@lucene.apache.org Subject: Re: encoding problem On Wed, Aug 26, 2009 at 10:24 AM, Bernadette Houghton bernadette.hough...@deakin.edu.au wrote: We have an encoding problem with our solr application. That is, non-ASCII chars displaying fine in SOLR, but in googledegook in our application . Our tomcat server.xml file already contains URIencoding=UTF-8 under the relevant connector. A google search reveals that I should set the encoding for the JVM, but have no idea how to do this. I'm running Windows, and there is no tomcat process in my Windows Services. Add the following parameter to the JVM: -Dfile.encoding=UTF-8 -- Regards, Shalin Shekhar Mangar.
Re: encoding problem
On Wed, Aug 26, 2009 at 12:42 PM, Bernadette Houghton bernadette.hough...@deakin.edu.au wrote: Hi Shalin, stupid question - I'm an apache/solr newbie - but how do I access the JVM??? When you execute the java executable, just add -Dfile.encoding=UTF-8 as a command line argument to the executable. How are you consuming Solr? You mentioned there is no tomcat, is your solr client a desktop java application? -- Regards, Shalin Shekhar Mangar.
Re: Exact word search
On Tue, Aug 25, 2009 at 10:40 AM, bhaskar chandrasekar bas_s...@yahoo.co.in wrote: Hi, Can any one helpe me with the below scenario?. Scenario 1: Assume that I give Google as input string i am using Carrot with Solr Carrot is for front end display purpose It seems like Carrot is the one making the queries to Solr? In that case, this question may be better suited for carrot users/developers. the issue is Assuming i give BHASKAR as input string It should give me search results pertaining to BHASKAR only. Select * from MASTER where name =Bhaskar; Example:It should not display search results as ChandarBhaskar or BhaskarC. Should display Bhaskar only. That is easy with Solr, make a query like field-name:Bhaskar. Make sure that field name is not tokenized i.e. string type in schema.xml Scenario 2: Select * from MASTER where name like %BHASKAR%; It should display records containing the word BHASKAR Ex: Bhaskar ChandarBhaskar BhaskarC Bhaskarabc Leading wildcards are not supported. However there are alternate ways of doing it. Create two fields, keep one as a normal string type and use a KeywordTokenizer and ReverseFilter on the other. Make one field a copyField of the other. Perform a prefix search on both fields. -- Regards, Shalin Shekhar Mangar.
RE: encoding problem
Thanks for your quick reply, Shalin. Tomcat is running on my Windows machine, but does not appear in Windows Services (as I was expecting it should ... am I wrong?). I'm running it from a startup.bat on my desktop - see below. Do I add the Dfile line to the startup.bat? SOLR is part of the repository software that we are running. Thanks! BERN Startup.bat - @echo off if %OS% == Windows_NT setlocal rem --- rem Start script for the CATALINA Server rem rem $Id: startup.bat 302918 2004-05-27 18:25:11Z yoavs $ rem --- rem Guess CATALINA_HOME if not defined set CURRENT_DIR=%cd% if not %CATALINA_HOME% == goto gotHome set CATALINA_HOME=%CURRENT_DIR% if exist %CATALINA_HOME%\bin\catalina.bat goto okHome cd .. set CATALINA_HOME=%cd% cd %CURRENT_DIR% :gotHome if exist %CATALINA_HOME%\bin\catalina.bat goto okHome echo The CATALINA_HOME environment variable is not defined correctly echo This environment variable is needed to run this program goto end :okHome set EXECUTABLE=%CATALINA_HOME%\bin\catalina.bat rem Check that target executable exists if exist %EXECUTABLE% goto okExec echo Cannot find %EXECUTABLE% echo This file is needed to run this program goto end :okExec rem Get remaining unshifted command line arguments and save them in the set CMD_LINE_ARGS= :setArgs if %1== goto doneSetArgs set CMD_LINE_ARGS=%CMD_LINE_ARGS% %1 shift goto setArgs :doneSetArgs call %EXECUTABLE% start %CMD_LINE_ARGS% :end
Re: shingle filter
On Tue, Aug 25, 2009 at 4:24 AM, Joe Calderon calderon@gmail.comwrote: hello *, im currently faceting on a shingled field to obtain popular phrases and its working well, however ide like to limit the number of shingles that get created, the solr.ShingleFilterFactory supports maxShingleSize, can it be made to support a minimum as well? can someone point me in the right direction? There is only maxShingleSize right now. The other configurable attribute is outputUnigrams which controls whether or not unigrams may be added to the index. If you want to add support for minimum size, I think you can make the changes in ShingleFilter.fillShingleBuffer(). Create an issue in jira and someone who knows more about shingles can help out. -- Regards, Shalin Shekhar Mangar.
Re: encoding problem
On Wed, Aug 26, 2009 at 12:52 PM, Bernadette Houghton bernadette.hough...@deakin.edu.au wrote: Thanks for your quick reply, Shalin. Tomcat is running on my Windows machine, but does not appear in Windows Services (as I was expecting it should ... am I wrong?). I'm running it from a startup.bat on my desktop - see below. Do I add the Dfile line to the startup.bat? SOLR is part of the repository software that we are running. Tomcat respects an environment variable called JAVA_OPTS through which you can pass any jvm argument (e.g. heap size, file encoding). Set JAVA_OPTS=-Dfile.encoding=UTF-8 either through the GUI or by adding the following to startup.bat: set JAVA_OPTS=-Dfile.encoding=UTF-8 -- Regards, Shalin Shekhar Mangar.
Re: solr 1.4: extending StatsComponent to recognize localparm {!ex}
Thanks for that. it works now ;-) Erik Hatcher-4 wrote: On Aug 25, 2009, at 6:35 PM, Britske wrote: Moreover, I can't seem to find the actual code in FacetComponent or anywhere else for that matter where the {!ex}-param case is treated. I assume it's in FacetComponent.refineFacets but I can't seem to get a grip on it.. Perhaps it's late here.. So, somone care to shed a light on how this might be done? (I only need some general directions I hope..) It's in SimpleFacets, that does a call to QueryParsing.getLocalParams(). Erik -- View this message in context: http://www.nabble.com/solr-1.4%3A-extending-StatsComponent-to-recognize-localparm-%7B%21ex%7D-tp25143428p25148403.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Create new core from existing
check this http://wiki.apache.org/solr/CoreAdmin when you create a core you are allowed to use the same instance dir as the old core just ensure that you give a different datadir On Wed, Aug 26, 2009 at 3:05 PM, pavan kumar donepudipavan.donep...@gmail.com wrote: Paul, Can you please guide me on which option i need to use to do this and if possible any sample or a wiki link. Thanks Regard's, Pavan 2009/8/26 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com The coreadmin would not copy your data. However, it is possible to create another core using the same config and schema On Wed, Aug 26, 2009 at 1:51 PM, pavan kumar donepudipavan.donep...@gmail.com wrote: hi everyone Is there any way to create a new solr core from the existing core using CoreAdminHandler,I want the instance directory to be created by copying the files from existing core and data directory path can be provided through dataDir querystring. Regard's, Pavan -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: solr nutch url indexing
Do you mean the schema or the solrconfig.xml? The request handler is configured in the solrconfig.xml and you can find out more about this particular configuration in http://wiki.apache.org/solr/DisMaxRequestHandler?highlight=(CategorySolrRequestHandler)|((CategorySolrRequestHandler)). To understand the schema better, you can read http://wiki.apache.org/solr/SchemaXml Uri last...@gmail.com wrote: Uri Boness wrote: Well... yes, it's a tool the Nutch ships with. It also ships with an example Solr schema which you can use. hi, is there any documentation to understand what going in the schema ? requestHandler name=/nutch class=solr.SearchHandler lst name=defaults str name=defTypedismax/str str name=echoParamsexplicit/str float name=tie0.01/float str name=qfcontent0.5 anchor1.0 title5.2/str str name=pfcontent0.5 anchor1.5 title5.2 site1.5/str str name=flurl/str str name=mm2lt;-1 5lt;-2 6lt;90%/str int name=ps100/int bool hl=true/ str name=q.alt*:*/str str name=hl.fltitle url content/str str name=f.title.hl.fragsize0/str str name=f.title.hl.alternateFieldtitle/str str name=f.url.hl.fragsize0/str str name=f.url.hl.alternateFieldurl/str str name=f.content.hl.fragmenterregex/str /lst /requestHandler
HTML decoder is splitting tokens
Hi. When indexing the string Guuml;nther with HTMLStripWhitespaceTokenizerFactory (in analysis.jsp), I get two tokens, Gü and nther. Is this a bug, or am I doing something wrong? (Using a Solr nightly from 2009-05-29) Anders.
Reason to change the xml files in solr
For the installation of apache solr integration module in Drupal we need to install solr. The must do thing is we need to change the solr schema.xml and configure.xml files with the files in apache solr integration module. can any body explain the reason behind this change. -- View this message in context: http://www.nabble.com/Reason-to-change-the-xml-files-in-solr-tp25151354p25151354.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: encoding problem
If you are complaining about Web Application (other than SOLR) (probably behind-the Apache HTTPD) having encoding problem - try to troubleshoot it with Mozilla Firefox + Live Http Headers plugin. Look at Content-Encoding HTTP response headers, and don't forget about meta http-equiv... tag inside HTML... -Fuad http://www.tokenizer.org -Original Message- From: Bernadette Houghton [mailto:bernadette.hough...@deakin.edu.au] Sent: August-26-09 12:55 AM To: 'solr-user@lucene.apache.org' Subject: encoding problem We have an encoding problem with our solr application. That is, non-ASCII chars displaying fine in SOLR, but in googledegook in our application . Our tomcat server.xml file already contains URIencoding=UTF-8 under the relevant connector. A google search reveals that I should set the encoding for the JVM, but have no idea how to do this. I'm running Windows, and there is no tomcat process in my Windows Services. TIA Bernadette Houghton, Library Business Applications Developer Deakin University Geelong Victoria 3217 Australia. Phone: 03 5227 8230 International: +61 3 5227 8230 Fax: 03 5227 8000 International: +61 3 5227 8000 MSN: bern_hough...@hotmail.com Email: bernadette.hough...@deakin.edu.aumailto:bernadette.hough...@deakin.edu.au Website: http://www.deakin.edu.au http://www.deakin.edu.au/Deakin University CRICOS Provider Code 00113B (Vic) Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone. Deakin University does not warrant that this email and any attachments are error or virus free
What makes a function query count as a match or not?
I haven't been able to find what makes a function query count as a match when used a part of a boolean query with Occur.MUST. A Term query is simple, if the term is not found, it doesn't count as a match. What's the equivalent for a function query? A score of zero (or less than zero, as implied by the source code for explain in lucene's boolean query?). Something else?
Re: HTML decoder is splitting tokens
Hi Anders, Sorry, I don't know this is a bug or a feature, but I'd like to show an alternate way if you'd like. In Solr trunk, HTMLStripWhitespaceTokenizerFactory is marked as deprecated. Instead, HTMLStripCharFilterFactory and an arbitrary TokenizerFactory are encouraged to use. And I'd recommend you to use MappingCharFilterFactory to convert character references to real characters. That is, you have: fieldType name=textHtml class=solr.TextField analyzer charFilter class=solr.MappingCharFilterFactory mapping=mapping.txt/ charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType where the contents of mapping.txt: uuml; = ü auml; = ä iuml; = ï euml; = ë ouml; = ö : : Then run analysis.jsp and see the result. Thank you, Koji Anders Melchiorsen wrote: Hi. When indexing the string Guuml;nther with HTMLStripWhitespaceTokenizerFactory (in analysis.jsp), I get two tokens, Gü and nther. Is this a bug, or am I doing something wrong? (Using a Solr nightly from 2009-05-29) Anders.
Solr admin url for example gives 404
Hello all, When I start up Solr from the example directory using start.jar, it seems to start up, but when I go to the localhost admin url (http://localhost:8983/solr/admin) I get a 404 (See message appended below). Has the url for the Solr admin changed? Tom Tom Burton-West --- Here is the message I get with the 404: HTTP ERROR: 404 NOT_FOUND RequestURI=/solr/admin Powered by jetty://http://jetty.mortbay.org Steps to reproduce the problems: 1 get the latest Solr from svn (R 808058) 2 run ant clean test (all tests pass) 3 cd ./example 4. start solr $ java -jar start.jar 2009-08-26 12:08:08.300::INFO: Logging to STDERR via org.mortbay.log.StdErrLog 2009-08-26 12:08:08.472::INFO: jetty-6.1.3 2009-08-26 12:08:08.519::INFO: Started SocketConnector @ 0.0.0.0:8983 5. go to browser and try to look at admin panel: http://localhost:8983/solr/admin
Re: Solr admin url for example gives 404
Hello! Try running ant example and then run Solr. -- Regards, Rafał Kuć Hello all, When I start up Solr from the example directory using start.jar, it seems to start up, but when I go to the localhost admin url (http://localhost:8983/solr/admin) I get a 404 (See message appended below). Has the url for the Solr admin changed? Tom Tom Burton-West --- Here is the message I get with the 404: HTTP ERROR: 404 NOT_FOUND RequestURI=/solr/admin Powered by jetty://http://jetty.mortbay.org Steps to reproduce the problems: 1 get the latest Solr from svn (R 808058) 2 run ant clean test (all tests pass) 3 cd ./example 4. start solr $ java -jar start.jar 2009-08-26 12:08:08.300::INFO: Logging to STDERR via org.mortbay.log.StdErrLog 2009-08-26 12:08:08.472::INFO: jetty-6.1.3 2009-08-26 12:08:08.519::INFO: Started SocketConnector @ 0.0.0.0:8983 5. go to browser and try to look at admin panel: http://localhost:8983/solr/admin
JDWP Error
The servlet container (resin) where i deploy solr shows : ERROR: transport error 202: bind failed: Address already in use ERROR: JDWP Transport dt_socket failed to initialize, TRANSPORT_INIT(510) JDWP exit error AGENT_ERROR_TRANSPORT_INIT(197): No transports initialized [../../../src/share/back/debugInit.c:690] FATAL ERROR in native method: JDWP No transports initialized, jvmtiError=AGENT_ERROR_TRANSPORT_INIT(197) ERROR: transport error 202: bind failed: Address already in use ERROR: JDWP Transport dt_socket failed to initialize, TRANSPORT_INIT(510) JDWP exit error AGENT_ERROR_TRANSPORT_INIT(197): No transports initialized [../../../src/share/back/debugInit.c:690] FATAL ERROR in native method: JDWP No transports initialized, jvmtiError=AGENT_ERROR_TRANSPORT_INIT(197) then, when we want to stop resin it doesn't works, any advice? thx -- Lici
SolrJ and Solr web simultaneously?
Is Solr like a RDBMS in that I can have multiple programs querying and updating the index at once, and everybody else will see the updates after a commit, or do I have to something explicit to see others updates? Does it matter whether they're using the web interface, SolrJ with a CommonsHttpSolrServer or SolrJ with a EmbeddedSolrServer? -- http://www.linkedin.com/in/paultomblin
Re: SolrJ and Solr web simultaneously?
Once a commit occurs, all data added before it (by any all clients) becomes visible to all searches henceforth. The web interface has direct access to Solr, and SolrJ remotely accesses that Solr. SolrEmbeddedSolrServer is something that few people should actually use. It's mostly for embedding Solr without running Solr as a server, which is a somewhat rare need. ~ David Smiley Author: http://www.packtpub.com/solr-1-4-enterprise-search-server On 8/26/09 1:14 PM, Paul Tomblin ptomb...@xcski.com wrote: Is Solr like a RDBMS in that I can have multiple programs querying and updating the index at once, and everybody else will see the updates after a commit, or do I have to something explicit to see others updates? Does it matter whether they're using the web interface, SolrJ with a CommonsHttpSolrServer or SolrJ with a EmbeddedSolrServer? -- http://www.linkedin.com/in/paultomblin
RE: JDWP Error
JDPA/JDWP are for remote debugging of SUN JVM... It shouldn't be SOLR related... check configs of Resin... -Fuad http://www.tokenizer.org -Original Message- From: Licinio Fernández Maurelo [mailto:licinio.fernan...@gmail.com] Sent: August-26-09 12:49 PM To: solr-user@lucene.apache.org Subject: JDWP Error The servlet container (resin) where i deploy solr shows : ERROR: transport error 202: bind failed: Address already in use ERROR: JDWP Transport dt_socket failed to initialize, TRANSPORT_INIT(510) JDWP exit error AGENT_ERROR_TRANSPORT_INIT(197): No transports initialized [../../../src/share/back/debugInit.c:690] FATAL ERROR in native method: JDWP No transports initialized, jvmtiError=AGENT_ERROR_TRANSPORT_INIT(197) ERROR: transport error 202: bind failed: Address already in use ERROR: JDWP Transport dt_socket failed to initialize, TRANSPORT_INIT(510) JDWP exit error AGENT_ERROR_TRANSPORT_INIT(197): No transports initialized [../../../src/share/back/debugInit.c:690] FATAL ERROR in native method: JDWP No transports initialized, jvmtiError=AGENT_ERROR_TRANSPORT_INIT(197) then, when we want to stop resin it doesn't works, any advice? thx -- Lici
Pattern matching in Solr
Hi, Can any one help me with the below scenario?. Scenario 1: Assume that I give Google as input string i am using Carrot with Solr Carrot is for front end display purpose the issue is Assuming i give BHASKAR as input string It should give me search results pertaining to BHASKAR only. Select * from MASTER where name =Bhaskar; Example:It should not display search results as ChandarBhaskar or BhaskarC. Should display Bhaskar only. Scenario 2: Select * from MASTER where name like %BHASKAR%; It should display records containing the word BHASKAR Ex: Bhaskar ChandarBhaskar BhaskarC Bhaskarabc How to achieve Scenario 1 in Solr ?. Regards Bhaskar
RE: SolrJ and Solr web simultaneously?
I have the same situation now. If I don't want to use http connection, so I need to use EmbeddedSolrServer that what I think I need correct? We have Master/slaves solr, the applications use slaves for search. The Master only taking the new index from Database and slaves will pull the new index using snappuller/snapinstaller. I don't want or try not to use http connection from Database to Solr Master because of network latency( very slow). Any suggestions? Francis -Original Message- From: Smiley, David W. [mailto:dsmi...@mitre.org] Sent: Wednesday, August 26, 2009 10:23 AM To: solr; Paul Tomblin Subject: Re: SolrJ and Solr web simultaneously? Once a commit occurs, all data added before it (by any all clients) becomes visible to all searches henceforth. The web interface has direct access to Solr, and SolrJ remotely accesses that Solr. SolrEmbeddedSolrServer is something that few people should actually use. It's mostly for embedding Solr without running Solr as a server, which is a somewhat rare need. ~ David Smiley Author: http://www.packtpub.com/solr-1-4-enterprise-search-server On 8/26/09 1:14 PM, Paul Tomblin ptomb...@xcski.com wrote: Is Solr like a RDBMS in that I can have multiple programs querying and updating the index at once, and everybody else will see the updates after a commit, or do I have to something explicit to see others updates? Does it matter whether they're using the web interface, SolrJ with a CommonsHttpSolrServer or SolrJ with a EmbeddedSolrServer? -- http://www.linkedin.com/in/paultomblin
Re: SolrJ and Solr web simultaneously?
See my response to Paul Tomblin. You could use the existing DataImportHandler SqlEntityProcessor for DB access. The DIH framework is fairly extensible. BTW, I wouldn't immediately dismiss using HTTP to give data to Solr just because you believe it will be slow without having tried it. Using SolrJ with StreamingUpdateSolrServer configured with multiple threads and using the default binary format is pretty darned fast. Don't knock it till you've tried it. ~ David Smiley Author: http://www.packtpub.com/solr-1-4-enterprise-search-server On 8/26/09 1:41 PM, Francis Yakin fya...@liquid.com wrote: I have the same situation now. If I don't want to use http connection, so I need to use EmbeddedSolrServer that what I think I need correct? We have Master/slaves solr, the applications use slaves for search. The Master only taking the new index from Database and slaves will pull the new index using snappuller/snapinstaller. I don't want or try not to use http connection from Database to Solr Master because of network latency( very slow). Any suggestions? Francis -Original Message- From: Smiley, David W. [mailto:dsmi...@mitre.org] Sent: Wednesday, August 26, 2009 10:23 AM To: solr; Paul Tomblin Subject: Re: SolrJ and Solr web simultaneously? Once a commit occurs, all data added before it (by any all clients) becomes visible to all searches henceforth. The web interface has direct access to Solr, and SolrJ remotely accesses that Solr. SolrEmbeddedSolrServer is something that few people should actually use. It's mostly for embedding Solr without running Solr as a server, which is a somewhat rare need. ~ David Smiley Author: http://www.packtpub.com/solr-1-4-enterprise-search-server On 8/26/09 1:14 PM, Paul Tomblin ptomb...@xcski.com wrote: Is Solr like a RDBMS in that I can have multiple programs querying and updating the index at once, and everybody else will see the updates after a commit, or do I have to something explicit to see others updates? Does it matter whether they're using the web interface, SolrJ with a CommonsHttpSolrServer or SolrJ with a EmbeddedSolrServer? -- http://www.linkedin.com/in/paultomblin
RE: SolrJ and Solr web simultaneously?
I don't want or try not to use http connection from Database to Solr Master because of network latency( very slow). network latency does not play any role here; throughput is more important. With separate SOLR instance on a separate box, and with separate java application (SOLR-bridge) querying database and using SolrJ, letency will be 1 second (for instance), but you can fine-tune performance by allocating necessary amount of threads (depends on latency of SOLR and Oracle, average doc size, etc), JDBC connections, etc. - and you can reach thousands docs per second throughput. DIHs only simplify some staff for total beginners... In addition, you will have nice Admin screen of standalone SOLR-master. -Fuad http://www.tokenizer.org -Original Message- From: Francis Yakin [mailto:fya...@liquid.com] Sent: August-26-09 1:41 PM To: 'solr-user@lucene.apache.org'; Paul Tomblin Subject: RE: SolrJ and Solr web simultaneously? I have the same situation now. If I don't want to use http connection, so I need to use EmbeddedSolrServer that what I think I need correct? We have Master/slaves solr, the applications use slaves for search. The Master only taking the new index from Database and slaves will pull the new index using snappuller/snapinstaller. I don't want or try not to use http connection from Database to Solr Master because of network latency( very slow). Any suggestions? Francis -Original Message- From: Smiley, David W. [mailto:dsmi...@mitre.org] Sent: Wednesday, August 26, 2009 10:23 AM To: solr; Paul Tomblin Subject: Re: SolrJ and Solr web simultaneously? Once a commit occurs, all data added before it (by any all clients) becomes visible to all searches henceforth. The web interface has direct access to Solr, and SolrJ remotely accesses that Solr. SolrEmbeddedSolrServer is something that few people should actually use. It's mostly for embedding Solr without running Solr as a server, which is a somewhat rare need. ~ David Smiley Author: http://www.packtpub.com/solr-1-4-enterprise-search-server On 8/26/09 1:14 PM, Paul Tomblin ptomb...@xcski.com wrote: Is Solr like a RDBMS in that I can have multiple programs querying and updating the index at once, and everybody else will see the updates after a commit, or do I have to something explicit to see others updates? Does it matter whether they're using the web interface, SolrJ with a CommonsHttpSolrServer or SolrJ with a EmbeddedSolrServer? -- http://www.linkedin.com/in/paultomblin
Re: SolrJ and Solr web simultaneously?
Is Solr like a RDBMS in that I can have multiple programs querying and updating the index at once, and everybody else will see the updates after a commit, or do I have to something explicit to see others updates? Yes, everyone gets to search on an existing index unless writes to the index (core) are committed. None of the searches would fetch uncommitted data. Does it matter whether they're using the web interface, SolrJ with a CommonsHttpSolrServer or SolrJ with a EmbeddedSolrServer? Absolutely not. All of these are multiple ways to access the Solr server; underlying implementation of searching the index and writing to the index does not change in either case. Cheers Avlesh On Wed, Aug 26, 2009 at 10:44 PM, Paul Tomblin ptomb...@xcski.com wrote: Is Solr like a RDBMS in that I can have multiple programs querying and updating the index at once, and everybody else will see the updates after a commit, or do I have to something explicit to see others updates? Does it matter whether they're using the web interface, SolrJ with a CommonsHttpSolrServer or SolrJ with a EmbeddedSolrServer? -- http://www.linkedin.com/in/paultomblin
Problem using replication in 8/25/09 nightly build of 1.4
Hi Everyone, When trying to utilize the new HTTP based replication built into Solr 1.4 I encounter a problem. When I view the replication admin page on the slave all of the master values are null i.e. Replicatable Index Version:null, Generation: null | Latest Index Version:null, Generation: null. Despite these missing values the two seem to be talking over HTTP successfully (if I shutdown the master the slave replication page starts exploding with a NPE). When I hit http://solr/replication?command=indexversionwt=xml I get the following... response - lst name=responseHeader int name=status0/int int name=QTime13/int /lst long name=indexversion0/long long name=generation0/long /response However in the admin/replication UI on the master I see... ** Index Version: 1250525534711, Generation: 1778 Any idea what I'm doing wrong or how I could begin to diagnose? I am using the 8/25 nightly build of solr with the example solrconfig.xml provided. The only modifications to the config have been to uncomment the master/rslave replication sections and remove the data directory location line so it falls back to solr.home/data. Also if it's relevant this index was originally created in solr 1.3. Thanks, Ron Ellis
RE: SolrJ and Solr web simultaneously?
Do you have firewall between DB and possible SOLR-Master instance? Do you have firewall between Client application and DB? Such configuration is strange... by default firewalls allow access to port 80, try to set port 80 for SOLR-Tomcat and/or configure AJP mapping for front-end HTTPD which you might have; btw Apache HTTPD with SOLR supports HTTP caching for SOLR-slaves... 1. SolrJ does not provide multithreading, but instance of CommonsHttpSolrServer is thread-safe. Developers need to implement multithreaded application. 2. SolrJ does not use JDBC; developers need to implement... It requires some Java coding, it is not out-of-the-box Document Import Handler. Suppose you have 2 quad-cores, why use single-threaded if we can use 8-threaded... or why wait 5 seconds responce from SOLR if we can use additional 32 threads doing job with DB at the same time... and why to share I/O between SOLR and DB? Diversify, lower risks, having SOLR and DB on same box is extremely unsafe... -Fuad -Original Message- From: Francis Yakin [mailto:fya...@liquid.com] Sent: August-26-09 2:25 PM To: 'solr-user@lucene.apache.org' Subject: RE: SolrJ and Solr web simultaneously? Thanks. The issue we have actually, it could be firewall issue more likely than network latency, that's why we try to avoid to use http connection. Fixing the firewall is not an option right now. We have around 3 millions docs to load from DB to Solr master( first initial load only) and subsequently we actively adding the new docs to Solr after the initial load. We prefer to use JDBC connection , so if solrj uses JDBC connection that might usefull. I also like the multi-threading option from Solrj. So, since we want the solr Master running as server also EmbedderSolrServer is not a good better approach for this? Francis -Original Message- From: Fuad Efendi [mailto:f...@efendi.ca] Sent: Wednesday, August 26, 2009 10:56 AM To: solr-user@lucene.apache.org Subject: RE: SolrJ and Solr web simultaneously? I don't want or try not to use http connection from Database to Solr Master because of network latency( very slow). network latency does not play any role here; throughput is more important. With separate SOLR instance on a separate box, and with separate java application (SOLR-bridge) querying database and using SolrJ, letency will be 1 second (for instance), but you can fine-tune performance by allocating necessary amount of threads (depends on latency of SOLR and Oracle, average doc size, etc), JDBC connections, etc. - and you can reach thousands docs per second throughput. DIHs only simplify some staff for total beginners... In addition, you will have nice Admin screen of standalone SOLR-master. -Fuad http://www.tokenizer.org -Original Message- From: Francis Yakin [mailto:fya...@liquid.com] Sent: August-26-09 1:41 PM To: 'solr-user@lucene.apache.org'; Paul Tomblin Subject: RE: SolrJ and Solr web simultaneously? I have the same situation now. If I don't want to use http connection, so I need to use EmbeddedSolrServer that what I think I need correct? We have Master/slaves solr, the applications use slaves for search. The Master only taking the new index from Database and slaves will pull the new index using snappuller/snapinstaller. I don't want or try not to use http connection from Database to Solr Master because of network latency( very slow). Any suggestions? Francis -Original Message- From: Smiley, David W. [mailto:dsmi...@mitre.org] Sent: Wednesday, August 26, 2009 10:23 AM To: solr; Paul Tomblin Subject: Re: SolrJ and Solr web simultaneously? Once a commit occurs, all data added before it (by any all clients) becomes visible to all searches henceforth. The web interface has direct access to Solr, and SolrJ remotely accesses that Solr. SolrEmbeddedSolrServer is something that few people should actually use. It's mostly for embedding Solr without running Solr as a server, which is a somewhat rare need. ~ David Smiley Author: http://www.packtpub.com/solr-1-4-enterprise-search-server On 8/26/09 1:14 PM, Paul Tomblin ptomb...@xcski.com wrote: Is Solr like a RDBMS in that I can have multiple programs querying and updating the index at once, and everybody else will see the updates after a commit, or do I have to something explicit to see others updates? Does it matter whether they're using the web interface, SolrJ with a CommonsHttpSolrServer or SolrJ with a EmbeddedSolrServer? -- http://www.linkedin.com/in/paultomblin
${solr.abortOnConfigurationError:false} - does it defaults to false
I have one quick question... If in solrconfig.xml, if it says ... abortOnConfigurationError${solr.abortOnConfigurationError:false}/abortOnConfigurationError does it mean abortOnConfigurationError defaults to false if it is not set as system property? Thanks, Dharmveer -- View this message in context: http://www.nabble.com/%24%7Bsolr.abortOnConfigurationError%3Afalse%7D---does-it-defaults-to-false-tp25155213p25155213.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: ${solr.abortOnConfigurationError:false} - does it defaults to false
On Aug 26, 2009, at 3:33 PM, djain101 wrote: I have one quick question... If in solrconfig.xml, if it says ... abortOnConfigurationError${solr.abortOnConfigurationError:false}/ abortOnConfigurationError does it mean abortOnConfigurationError defaults to false if it is not set as system property? correct
Searching and Displaying Different Logical Entities
I'm trying to figure out if Solr is the right solution for a problem I'm facing. I have 2 data entities: P(arent) C(hild). P contains up to 100 instances of C. I need to expose an interface that searches attributes of entity C, but displays them grouped by parent entity, P. I need to include facet counts in the result, and the counts are based on P. My first solution was to create 2 Solr instances: one for each entity. I would have to execute 2 queries each time: 1) get a list of matching P's based on a query of the C instance (facet by P ID in C instance to get unique list of P's), then 2) get all P's by ID, including facet counts, etc. The problem I face with this solution is that I can have many matching P's (10,000+), so my second query will have many (10,000+) constraints. My second (and current) solution is to create a single instance, and flatten all C attributes into the appropriate P record using dynamic fields. For example, if C has an attribute CA, then I have a dynamic field in P called CA*. I name this field incrementally based on the number of C's per P (CA1, CA2, ...). This works, except that each query is very long (CA1:condition OR CA2: condition ...). Neither solution is ideal. I'm wondering if I'm missing something obvious, or if I'm using the wrong solution for this problem. Any insight is appreciated. Wojtek -- View this message in context: http://www.nabble.com/Searching-and-Displaying-Different-Logical-Entities-tp25156301p25156301.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Searching and Displaying Different Logical Entities
then 2) get all P's by ID, including facet counts, etc. The problem I face with this solution is that I can have many matching P's (10,000+), so my second query will have many (10,000+) constraints. SOLR can automatically provide you P's with Counts, and it will be _unique_... Even if cardinality of P is 10,000+ SOLR is very fast now (expect few seconds response time for initial request). You need single query with faceting... (!) You do not need P's ID. Single document will have unique ID, and fields such as P, C (with possible attributes). Do not think in terms of RDBMS... Lucene does all 'normalization' behind the scenes, and SOLR will give you Ps with Cs... -Original Message- From: wojtekpia [mailto:wojte...@hotmail.com] Sent: August-26-09 3:58 PM To: solr-user@lucene.apache.org Subject: Searching and Displaying Different Logical Entities I'm trying to figure out if Solr is the right solution for a problem I'm facing. I have 2 data entities: P(arent) C(hild). P contains up to 100 instances of C. I need to expose an interface that searches attributes of entity C, but displays them grouped by parent entity, P. I need to include facet counts in the result, and the counts are based on P. My first solution was to create 2 Solr instances: one for each entity. I would have to execute 2 queries each time: 1) get a list of matching P's based on a query of the C instance (facet by P ID in C instance to get unique list of P's), then 2) get all P's by ID, including facet counts, etc. The problem I face with this solution is that I can have many matching P's (10,000+), so my second query will have many (10,000+) constraints. My second (and current) solution is to create a single instance, and flatten all C attributes into the appropriate P record using dynamic fields. For example, if C has an attribute CA, then I have a dynamic field in P called CA*. I name this field incrementally based on the number of C's per P (CA1, CA2, ...). This works, except that each query is very long (CA1:condition OR CA2: condition ...). Neither solution is ideal. I'm wondering if I'm missing something obvious, or if I'm using the wrong solution for this problem. Any insight is appreciated. Wojtek -- View this message in context: http://www.nabble.com/Searching-and-Displaying-Different-Logical-Entities-tp 25156301p25156301.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: SolrJ and Solr web simultaneously?
We already opened port 80 from solr to DB so that's not the issue, but httpd(port 80) is very flaky if there is firewall between Solr and DB. We have Solr master/slaves env, client access the search thru slaves( master only accept the new index from DB and slaves will pull the new indexes from Solr master). We have someone in Development team knows Java and implement JDBC. We don't share Solr master and DB on the same box, it's separate box and separate network, port 80 opened between these. It looks like CommonsHttpSolrServer is better approach than EmbeddedSolrServer, since we want the Solr Master acting as a solr server as well. I just worried that http will be a bottle neck, that's why I prefer JDBC connection method. Francis -Original Message- From: Fuad Efendi [mailto:f...@efendi.ca] Sent: Wednesday, August 26, 2009 11:56 AM To: solr-user@lucene.apache.org Subject: RE: SolrJ and Solr web simultaneously? Do you have firewall between DB and possible SOLR-Master instance? Do you have firewall between Client application and DB? Such configuration is strange... by default firewalls allow access to port 80, try to set port 80 for SOLR-Tomcat and/or configure AJP mapping for front-end HTTPD which you might have; btw Apache HTTPD with SOLR supports HTTP caching for SOLR-slaves... 1. SolrJ does not provide multithreading, but instance of CommonsHttpSolrServer is thread-safe. Developers need to implement multithreaded application. 2. SolrJ does not use JDBC; developers need to implement... It requires some Java coding, it is not out-of-the-box Document Import Handler. Suppose you have 2 quad-cores, why use single-threaded if we can use 8-threaded... or why wait 5 seconds responce from SOLR if we can use additional 32 threads doing job with DB at the same time... and why to share I/O between SOLR and DB? Diversify, lower risks, having SOLR and DB on same box is extremely unsafe... -Fuad -Original Message- From: Francis Yakin [mailto:fya...@liquid.com] Sent: August-26-09 2:25 PM To: 'solr-user@lucene.apache.org' Subject: RE: SolrJ and Solr web simultaneously? Thanks. The issue we have actually, it could be firewall issue more likely than network latency, that's why we try to avoid to use http connection. Fixing the firewall is not an option right now. We have around 3 millions docs to load from DB to Solr master( first initial load only) and subsequently we actively adding the new docs to Solr after the initial load. We prefer to use JDBC connection , so if solrj uses JDBC connection that might usefull. I also like the multi-threading option from Solrj. So, since we want the solr Master running as server also EmbedderSolrServer is not a good better approach for this? Francis -Original Message- From: Fuad Efendi [mailto:f...@efendi.ca] Sent: Wednesday, August 26, 2009 10:56 AM To: solr-user@lucene.apache.org Subject: RE: SolrJ and Solr web simultaneously? I don't want or try not to use http connection from Database to Solr Master because of network latency( very slow). network latency does not play any role here; throughput is more important. With separate SOLR instance on a separate box, and with separate java application (SOLR-bridge) querying database and using SolrJ, letency will be 1 second (for instance), but you can fine-tune performance by allocating necessary amount of threads (depends on latency of SOLR and Oracle, average doc size, etc), JDBC connections, etc. - and you can reach thousands docs per second throughput. DIHs only simplify some staff for total beginners... In addition, you will have nice Admin screen of standalone SOLR-master. -Fuad http://www.tokenizer.org -Original Message- From: Francis Yakin [mailto:fya...@liquid.com] Sent: August-26-09 1:41 PM To: 'solr-user@lucene.apache.org'; Paul Tomblin Subject: RE: SolrJ and Solr web simultaneously? I have the same situation now. If I don't want to use http connection, so I need to use EmbeddedSolrServer that what I think I need correct? We have Master/slaves solr, the applications use slaves for search. The Master only taking the new index from Database and slaves will pull the new index using snappuller/snapinstaller. I don't want or try not to use http connection from Database to Solr Master because of network latency( very slow). Any suggestions? Francis -Original Message- From: Smiley, David W. [mailto:dsmi...@mitre.org] Sent: Wednesday, August 26, 2009 10:23 AM To: solr; Paul Tomblin Subject: Re: SolrJ and Solr web simultaneously? Once a commit occurs, all data added before it (by any all clients) becomes visible to all searches henceforth. The web interface has direct access to Solr, and SolrJ remotely accesses that Solr. SolrEmbeddedSolrServer is something that few people should actually use. It's mostly for embedding Solr without running Solr as a server, which is a somewhat rare need. ~ David Smiley Author:
RE: SolrJ and Solr web simultaneously?
With this configuration probably preferred method is to run standalone Java application on same box as DB, or very close to DB (in same network segment). HTTP is not a bottleneck; main bottleneck is indexing/committing/merging/optimizing in SOLR... Just as a sample, if you submit to SOLR batch of large documents, - expect 5-55 seconds response time (even with EmbeddedSolr or pure Lucene), but nothing related to network latency nor to firewalling... upload 1Mb over 100Mbps network takes less than 0.1 seconds, but indexing it may take 0.5 secs... Standalone application with SolrJ is also good because you may schedule batch updates etc; automated... P.S. In theory, if you are using Oracle, you may even try to implement triggers written in Java causing SOLR update on each row update (transactional); but I haven't heard anyone uses stored procs in Java, too risky and slow, with specific dependencies... -Original Message- From: Francis Yakin [mailto:fya...@liquid.com] Sent: August-26-09 4:18 PM To: 'solr-user@lucene.apache.org' Subject: RE: SolrJ and Solr web simultaneously? We already opened port 80 from solr to DB so that's not the issue, but httpd(port 80) is very flaky if there is firewall between Solr and DB. We have Solr master/slaves env, client access the search thru slaves( master only accept the new index from DB and slaves will pull the new indexes from Solr master). We have someone in Development team knows Java and implement JDBC. We don't share Solr master and DB on the same box, it's separate box and separate network, port 80 opened between these. It looks like CommonsHttpSolrServer is better approach than EmbeddedSolrServer, since we want the Solr Master acting as a solr server as well. I just worried that http will be a bottle neck, that's why I prefer JDBC connection method. Francis -Original Message- From: Fuad Efendi [mailto:f...@efendi.ca] Sent: Wednesday, August 26, 2009 11:56 AM To: solr-user@lucene.apache.org Subject: RE: SolrJ and Solr web simultaneously? Do you have firewall between DB and possible SOLR-Master instance? Do you have firewall between Client application and DB? Such configuration is strange... by default firewalls allow access to port 80, try to set port 80 for SOLR-Tomcat and/or configure AJP mapping for front-end HTTPD which you might have; btw Apache HTTPD with SOLR supports HTTP caching for SOLR-slaves... 1. SolrJ does not provide multithreading, but instance of CommonsHttpSolrServer is thread-safe. Developers need to implement multithreaded application. 2. SolrJ does not use JDBC; developers need to implement... It requires some Java coding, it is not out-of-the-box Document Import Handler. Suppose you have 2 quad-cores, why use single-threaded if we can use 8-threaded... or why wait 5 seconds responce from SOLR if we can use additional 32 threads doing job with DB at the same time... and why to share I/O between SOLR and DB? Diversify, lower risks, having SOLR and DB on same box is extremely unsafe... -Fuad -Original Message- From: Francis Yakin [mailto:fya...@liquid.com] Sent: August-26-09 2:25 PM To: 'solr-user@lucene.apache.org' Subject: RE: SolrJ and Solr web simultaneously? Thanks. The issue we have actually, it could be firewall issue more likely than network latency, that's why we try to avoid to use http connection. Fixing the firewall is not an option right now. We have around 3 millions docs to load from DB to Solr master( first initial load only) and subsequently we actively adding the new docs to Solr after the initial load. We prefer to use JDBC connection , so if solrj uses JDBC connection that might usefull. I also like the multi-threading option from Solrj. So, since we want the solr Master running as server also EmbedderSolrServer is not a good better approach for this? Francis -Original Message- From: Fuad Efendi [mailto:f...@efendi.ca] Sent: Wednesday, August 26, 2009 10:56 AM To: solr-user@lucene.apache.org Subject: RE: SolrJ and Solr web simultaneously? I don't want or try not to use http connection from Database to Solr Master because of network latency( very slow). network latency does not play any role here; throughput is more important. With separate SOLR instance on a separate box, and with separate java application (SOLR-bridge) querying database and using SolrJ, letency will be 1 second (for instance), but you can fine-tune performance by allocating necessary amount of threads (depends on latency of SOLR and Oracle, average doc size, etc), JDBC connections, etc. - and you can reach thousands docs per second throughput. DIHs only simplify some staff for total beginners... In addition, you will have nice Admin screen of standalone SOLR-master. -Fuad http://www.tokenizer.org -Original Message- From: Francis Yakin [mailto:fya...@liquid.com] Sent: August-26-09 1:41 PM To: 'solr-user@lucene.apache.org'; Paul Tomblin Subject: RE: SolrJ and Solr web
Re: What makes a function query count as a match or not?
On Wed, Aug 26, 2009 at 11:27 AM, Christophe Bioccachristo...@openplaces.org wrote: I haven't been able to find what makes a function query count as a match when used a part of a boolean query with Occur.MUST. A function query matches all non-deleted documents. -Yonik http://www.lucidimagination.com
RE: SolrJ and Solr web simultaneously?
I just worried that http will be a bottle neck, that's why I prefer JDBC connection method. - JDBC is a library for Java Application; it connects to Database; it uses proprietary protocol provided by DB vendor in most cases, and specific port number - SolrJ is a library for Java Application; it connects to SOLR; it uses HTTP protocol
RE: SolrJ and Solr web simultaneously?
No, we don't want to put at the same box as Database box. Agree, that indexing/committing/merging and optimizing is the bottle neck. I think it worths to try SolrJ with CommmonsHttpSolrServer option for now and let's see what happened to load 3 millions docs. Thanks Francis -Original Message- From: Fuad Efendi [mailto:f...@efendi.ca] Sent: Wednesday, August 26, 2009 1:34 PM To: solr-user@lucene.apache.org Subject: RE: SolrJ and Solr web simultaneously? With this configuration probably preferred method is to run standalone Java application on same box as DB, or very close to DB (in same network segment). HTTP is not a bottleneck; main bottleneck is indexing/committing/merging/optimizing in SOLR... Just as a sample, if you submit to SOLR batch of large documents, - expect 5-55 seconds response time (even with EmbeddedSolr or pure Lucene), but nothing related to network latency nor to firewalling... upload 1Mb over 100Mbps network takes less than 0.1 seconds, but indexing it may take 0.5 secs... Standalone application with SolrJ is also good because you may schedule batch updates etc; automated... P.S. In theory, if you are using Oracle, you may even try to implement triggers written in Java causing SOLR update on each row update (transactional); but I haven't heard anyone uses stored procs in Java, too risky and slow, with specific dependencies... -Original Message- From: Francis Yakin [mailto:fya...@liquid.com] Sent: August-26-09 4:18 PM To: 'solr-user@lucene.apache.org' Subject: RE: SolrJ and Solr web simultaneously? We already opened port 80 from solr to DB so that's not the issue, but httpd(port 80) is very flaky if there is firewall between Solr and DB. We have Solr master/slaves env, client access the search thru slaves( master only accept the new index from DB and slaves will pull the new indexes from Solr master). We have someone in Development team knows Java and implement JDBC. We don't share Solr master and DB on the same box, it's separate box and separate network, port 80 opened between these. It looks like CommonsHttpSolrServer is better approach than EmbeddedSolrServer, since we want the Solr Master acting as a solr server as well. I just worried that http will be a bottle neck, that's why I prefer JDBC connection method. Francis -Original Message- From: Fuad Efendi [mailto:f...@efendi.ca] Sent: Wednesday, August 26, 2009 11:56 AM To: solr-user@lucene.apache.org Subject: RE: SolrJ and Solr web simultaneously? Do you have firewall between DB and possible SOLR-Master instance? Do you have firewall between Client application and DB? Such configuration is strange... by default firewalls allow access to port 80, try to set port 80 for SOLR-Tomcat and/or configure AJP mapping for front-end HTTPD which you might have; btw Apache HTTPD with SOLR supports HTTP caching for SOLR-slaves... 1. SolrJ does not provide multithreading, but instance of CommonsHttpSolrServer is thread-safe. Developers need to implement multithreaded application. 2. SolrJ does not use JDBC; developers need to implement... It requires some Java coding, it is not out-of-the-box Document Import Handler. Suppose you have 2 quad-cores, why use single-threaded if we can use 8-threaded... or why wait 5 seconds responce from SOLR if we can use additional 32 threads doing job with DB at the same time... and why to share I/O between SOLR and DB? Diversify, lower risks, having SOLR and DB on same box is extremely unsafe... -Fuad -Original Message- From: Francis Yakin [mailto:fya...@liquid.com] Sent: August-26-09 2:25 PM To: 'solr-user@lucene.apache.org' Subject: RE: SolrJ and Solr web simultaneously? Thanks. The issue we have actually, it could be firewall issue more likely than network latency, that's why we try to avoid to use http connection. Fixing the firewall is not an option right now. We have around 3 millions docs to load from DB to Solr master( first initial load only) and subsequently we actively adding the new docs to Solr after the initial load. We prefer to use JDBC connection , so if solrj uses JDBC connection that might usefull. I also like the multi-threading option from Solrj. So, since we want the solr Master running as server also EmbedderSolrServer is not a good better approach for this? Francis -Original Message- From: Fuad Efendi [mailto:f...@efendi.ca] Sent: Wednesday, August 26, 2009 10:56 AM To: solr-user@lucene.apache.org Subject: RE: SolrJ and Solr web simultaneously? I don't want or try not to use http connection from Database to Solr Master because of network latency( very slow). network latency does not play any role here; throughput is more important. With separate SOLR instance on a separate box, and with separate java application (SOLR-bridge) querying database and using SolrJ, letency will be 1 second (for instance), but you can fine-tune performance by allocating necessary amount of threads (depends on latency of SOLR
RE: Solr Replication
Thanks for the response. It's interesting because when I run jconsole all I can see is one ReplicationHandler jmx mbean. It looks like it is defaulting to the first slice it finds on its path. Is there anyway to have multiple replication handlers or at least obtain replication on a per slice/instance via JMX like how you can see attributes for each slice/instance via each replication admin jsp page? Thanks again. From: noble.p...@corp.aol.com Date: Wed, 26 Aug 2009 11:05:34 +0530 Subject: Re: Solr Replication To: solr-user@lucene.apache.org The ReplicationHandler is not enforced as a singleton , but for all practical purposes it is a singleton for one core. If an instance (a slice as you say) is setup as a repeater, It can act as both a master and slave in the repeater the configuration should be as follows MASTER |_SLAVE (I am a slave of MASTER) | REPEATER (I am a slave of MASTER and master to my slaves ) | | REPEATER_SLAVE( of REPEATER) the point is that REPEATER will have a slave section has a masterUrl which points to master and REPEATER_SLAVE will have a slave section which has a masterurl pointing to repeater On Wed, Aug 26, 2009 at 12:40 AM, J Gskinny_joe...@hotmail.com wrote: Hello, We are running multiple slices in our environment. I have enabled JMX and I am inspecting the replication handler mbean to obtain some information about the master/slave configuration for replication. Is the replication handler mbean a singleton? I only see one mbean for the entire server and it's picking an arbitrary slice to report on. So I'm curious if every slice gets its own replication handler mbean? This is important because I have no way of knowing in this specific server any information about the other slices, in particular, information about the master/slave value for the other slices. Reading through the Solr 1.4 replication strategy, I saw that a slice can be configured to be a master and a slave, i.e. a repeater. I'm wondering how repeaters work because let's say I have a slice named 'A' and the master is on server 1 and the slave is on server 2 then how are these two servers communicating to replicate? Looking at the jmx information I have in the MBean both the isSlave and isMaster is set to true for my repeater so how does this solr slice know if it's the master or slave? I'm a bit confused. Thanks. _ With Windows Live, you can organize, edit, and share your photos. http://www.windowslive.com/Desktop/PhotoGallery -- - Noble Paul | Principal Engineer| AOL | http://aol.com _ Hotmail® is up to 70% faster. Now good news travels really fast. http://windowslive.com/online/hotmail?ocid=PID23391::T:WLMTAGL:ON:WL:en-US:WM_HYGN_faster:082009
SortableFloatFieldSource not accessible? (1.3)
The class SortableFloatFieldSource cannot be accessed from outside its package. So it can't be used as part of a FunctionQuery. Is there a workaround to this, or should I roll my own? Will it be fixed in 1.4?
RE: SolrJ and Solr web simultaneously?
Thanks for the response. I will try CommonsHttpSolrServer for now. Francis -Original Message- From: Fuad Efendi [mailto:f...@efendi.ca] Sent: Wednesday, August 26, 2009 1:34 PM To: solr-user@lucene.apache.org Subject: RE: SolrJ and Solr web simultaneously? With this configuration probably preferred method is to run standalone Java application on same box as DB, or very close to DB (in same network segment). HTTP is not a bottleneck; main bottleneck is indexing/committing/merging/optimizing in SOLR... Just as a sample, if you submit to SOLR batch of large documents, - expect 5-55 seconds response time (even with EmbeddedSolr or pure Lucene), but nothing related to network latency nor to firewalling... upload 1Mb over 100Mbps network takes less than 0.1 seconds, but indexing it may take 0.5 secs... Standalone application with SolrJ is also good because you may schedule batch updates etc; automated... P.S. In theory, if you are using Oracle, you may even try to implement triggers written in Java causing SOLR update on each row update (transactional); but I haven't heard anyone uses stored procs in Java, too risky and slow, with specific dependencies... -Original Message- From: Francis Yakin [mailto:fya...@liquid.com] Sent: August-26-09 4:18 PM To: 'solr-user@lucene.apache.org' Subject: RE: SolrJ and Solr web simultaneously? We already opened port 80 from solr to DB so that's not the issue, but httpd(port 80) is very flaky if there is firewall between Solr and DB. We have Solr master/slaves env, client access the search thru slaves( master only accept the new index from DB and slaves will pull the new indexes from Solr master). We have someone in Development team knows Java and implement JDBC. We don't share Solr master and DB on the same box, it's separate box and separate network, port 80 opened between these. It looks like CommonsHttpSolrServer is better approach than EmbeddedSolrServer, since we want the Solr Master acting as a solr server as well. I just worried that http will be a bottle neck, that's why I prefer JDBC connection method. Francis -Original Message- From: Fuad Efendi [mailto:f...@efendi.ca] Sent: Wednesday, August 26, 2009 11:56 AM To: solr-user@lucene.apache.org Subject: RE: SolrJ and Solr web simultaneously? Do you have firewall between DB and possible SOLR-Master instance? Do you have firewall between Client application and DB? Such configuration is strange... by default firewalls allow access to port 80, try to set port 80 for SOLR-Tomcat and/or configure AJP mapping for front-end HTTPD which you might have; btw Apache HTTPD with SOLR supports HTTP caching for SOLR-slaves... 1. SolrJ does not provide multithreading, but instance of CommonsHttpSolrServer is thread-safe. Developers need to implement multithreaded application. 2. SolrJ does not use JDBC; developers need to implement... It requires some Java coding, it is not out-of-the-box Document Import Handler. Suppose you have 2 quad-cores, why use single-threaded if we can use 8-threaded... or why wait 5 seconds responce from SOLR if we can use additional 32 threads doing job with DB at the same time... and why to share I/O between SOLR and DB? Diversify, lower risks, having SOLR and DB on same box is extremely unsafe... -Fuad -Original Message- From: Francis Yakin [mailto:fya...@liquid.com] Sent: August-26-09 2:25 PM To: 'solr-user@lucene.apache.org' Subject: RE: SolrJ and Solr web simultaneously? Thanks. The issue we have actually, it could be firewall issue more likely than network latency, that's why we try to avoid to use http connection. Fixing the firewall is not an option right now. We have around 3 millions docs to load from DB to Solr master( first initial load only) and subsequently we actively adding the new docs to Solr after the initial load. We prefer to use JDBC connection , so if solrj uses JDBC connection that might usefull. I also like the multi-threading option from Solrj. So, since we want the solr Master running as server also EmbedderSolrServer is not a good better approach for this? Francis -Original Message- From: Fuad Efendi [mailto:f...@efendi.ca] Sent: Wednesday, August 26, 2009 10:56 AM To: solr-user@lucene.apache.org Subject: RE: SolrJ and Solr web simultaneously? I don't want or try not to use http connection from Database to Solr Master because of network latency( very slow). network latency does not play any role here; throughput is more important. With separate SOLR instance on a separate box, and with separate java application (SOLR-bridge) querying database and using SolrJ, letency will be 1 second (for instance), but you can fine-tune performance by allocating necessary amount of threads (depends on latency of SOLR and Oracle, average doc size, etc), JDBC connections, etc. - and you can reach thousands docs per second throughput. DIHs only simplify some staff for total beginners... In addition, you will have nice
Re: Using Lucene's payload in Solr
While testing my code I discovered that my copyField with PatternTokenize does not do what I want. This is what I am indexing into Solr: field name=title2.0|Solr In Action/field My copyField is simply: copyField source=title dest=titleRaw/ field titleRaw is of type title_raw: fieldType name=title_raw class=solr.TextField analyzer type=index tokenizer class=solr.PatternTokenizerFactory pattern=[^#]*#(.*) group=1/ /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ /analyzer /fieldType For my example input Solr in Action is indexed into the titleRaw field without the payload. But the payload is still stored. So when I retrieve the field titleRaw I still get back 2.0|Solr in Action where what I really want is just Solr in Action. Is it possible to have the copyField strip off the payload while it is copying since doing it in the analysis phrase is too late? Or should I start looking into using UpdateProcessors as Chris had suggested? Bill On Fri, Aug 21, 2009 at 12:04 PM, Bill Au bill.w...@gmail.com wrote: I ended up not using an XML attribute for the payload since I need to return the payload in query response. So I ended up going with: field name=title2.0|Solr In Action/field My payload is numeric so I can pick a non-numeric delimiter (ie '|'). Putting the payload in front means I don't have to worry about the delimiter appearing in the value. The payload is required in my case so I can simply look for the first occurrence of the delimiter and ignore the possibility of the delimiter appearing in the value. I ended up writing a custom Tokenizer and a copy field with a PatternTokenizerFactory to filter out the delimiter and payload. That's is straight forward in terms of implementation. On top of that I can still use the CSV loader, which I really like because of its speed. Bill. On Thu, Aug 20, 2009 at 10:36 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : of the field are correct but the delimiter and payload are stored so they : appear in the response also. Here is an example: ... : I am thinking maybe I can do this instead when indexing: : : XML for indexing: : field name=title payload=2.0Solr In Action/field : : This will simplify indexing as I don't have to repeat the payload for each but now you're into a custom request handler for the updates to deal with the custom XML attribute so you can't use DIH, or CSV loading. It seems like it might be simpler have two new (generic) UpdateProcessors: one that can clone fieldA into fieldB, and one that can do regex mutations on fieldB ... neither needs to know about payloads at all, but the first can made a copy of 2.0|Solr In Action and the second can strip off the 2.0| from the copy. then you can write a new NumericPayloadRegexTokenizer that takes in two regex expressions -- one that knows how to extract the payload from a piece of input, and one that specifies the tokenization. those three classes seem easier to implemnt, easier to maintain, and more generally reusable then a custom xml request handler for your updates. -Hoss
Sorting by Unindexed Fields
Hi, I have a situation where a particular kind of document can be categorized in different ways, and depending on the categories it is in it will have different fields that describe it (in practice the number of fields will be fairly small, but whatever). These documents will each have a full-text field that Solr is perfect for, and it seems like Solr's dynamic fields ability makes it an even more perfect solution. I'd like to be able to sort by any of the fields, but indexing them all seems somewhere between unwise and impossible. Will Solr sort by fields that are unindexed? iSac
Re: SortableFloatFieldSource not accessible? (1.3)
SortableFloatField works in function queries... it's just that everyone goes through SortableFloatField.getValueSource() to create them. Will that work for you? -Yonik http://www.lucidimagination.com On Wed, Aug 26, 2009 at 6:23 PM, Christophe Bioccachristo...@openplaces.org wrote: The class SortableFloatFieldSource cannot be accessed from outside its package. So it can't be used as part of a FunctionQuery. Is there a workaround to this, or should I roll my own? Will it be fixed in 1.4?
Re: Sorting by Unindexed Fields
Will Solr sort by fields that are unindexed? Unfortunately, No. Cheers Avlesh On Thu, Aug 27, 2009 at 4:03 AM, Isaac Foster isaac.z.fos...@gmail.comwrote: Hi, I have a situation where a particular kind of document can be categorized in different ways, and depending on the categories it is in it will have different fields that describe it (in practice the number of fields will be fairly small, but whatever). These documents will each have a full-text field that Solr is perfect for, and it seems like Solr's dynamic fields ability makes it an even more perfect solution. I'd like to be able to sort by any of the fields, but indexing them all seems somewhere between unwise and impossible. Will Solr sort by fields that are unindexed? iSac
Re: Sorting by Unindexed Fields
Is it also the case that it will not narrow by them? Isaac On Wed, Aug 26, 2009 at 8:59 PM, Avlesh Singh avl...@gmail.com wrote: Will Solr sort by fields that are unindexed? Unfortunately, No. Cheers Avlesh On Thu, Aug 27, 2009 at 4:03 AM, Isaac Foster isaac.z.fos...@gmail.com wrote: Hi, I have a situation where a particular kind of document can be categorized in different ways, and depending on the categories it is in it will have different fields that describe it (in practice the number of fields will be fairly small, but whatever). These documents will each have a full-text field that Solr is perfect for, and it seems like Solr's dynamic fields ability makes it an even more perfect solution. I'd like to be able to sort by any of the fields, but indexing them all seems somewhere between unwise and impossible. Will Solr sort by fields that are unindexed? iSac
Re: Sorting by Unindexed Fields
Is it also the case that it will not narrow by them? If narrowing means faceting, then again a no. Cheers Avlesh On Thu, Aug 27, 2009 at 6:36 AM, Isaac Foster isaac.z.fos...@gmail.comwrote: Is it also the case that it will not narrow by them? Isaac On Wed, Aug 26, 2009 at 8:59 PM, Avlesh Singh avl...@gmail.com wrote: Will Solr sort by fields that are unindexed? Unfortunately, No. Cheers Avlesh On Thu, Aug 27, 2009 at 4:03 AM, Isaac Foster isaac.z.fos...@gmail.com wrote: Hi, I have a situation where a particular kind of document can be categorized in different ways, and depending on the categories it is in it will have different fields that describe it (in practice the number of fields will be fairly small, but whatever). These documents will each have a full-text field that Solr is perfect for, and it seems like Solr's dynamic fields ability makes it an even more perfect solution. I'd like to be able to sort by any of the fields, but indexing them all seems somewhere between unwise and impossible. Will Solr sort by fields that are unindexed? iSac
Fwd: Lucene Search Performance Analysis Workshop
While Andrzej's talk will focus on things at the Lucene layer, I'm sure there'll be some great tips and tricks useful to Solrians too. Andrzej is one of the sharpest folks I've met, and he's also a very impressive presenter. Tune in if you can. Erik Begin forwarded message: From: Andrzej Bialecki a...@getopt.org Date: August 26, 2009 5:44:40 PM EDT To: java-u...@lucene.apache.org Subject: Lucene Search Performance Analysis Workshop Reply-To: java-u...@lucene.apache.org Hi all, I am giving a free talk/ workshop next week on how to analyze and improve Lucene search performance for native lucene apps. If you've ever been challenged to get your Java Lucene search apps running faster, I think you might find the talk of interest. Free online workshop: Thursday, September 3rd 2009 11:00-11:30AM PDT / 14:00-14:30 EDT Follow this link to sign up: http://www2.eventsvc.com/lucidimagination/event/ff97623d-3fd5-43ba-a69d-650dcb1d6bbc?trk=WR-SEP2009-AP About: Lucene Performance Workshop: Understanding Lucene Search Performance with Andrzej Bialecki Experienced Java developers know how to use the Apache Lucene library to build powerful search applications natively in Java. LucidGaze for Lucene from Lucid Imagination, just released this week, provides a powerful utility for making transparent the underlying indexing and search operations, and analyzing their impact on search performance. Agenda: * Understanding sources of variability in Lucene search performance * LucidGaze for Lucene APIs for performance statistics * Applying LucidGaze for Lucene performance statistics to real-world performance problems Join us for a free online workshop. Sign up via the link below: http://www2.eventsvc.com/lucidimagination/event/ff97623d-3fd5-43ba-a69d-650dcb1d6bbc?trk=WR-SEP2009-AP About the Presenter: Andrzej Bialecki, Apache Lucene PMC Member, is on the Lucid Imagination Technical Advisory Board; he also serves as the project lead for Nutch, and as committer in the Lucene-java, Nutch and Hadoop projects. He has broad expertise, across domains as diverse as information retrieval, systems architecture, embedded systems kernels, networking and business process/e-commerce modeling. He's also the author of the popular Luke index inspection utility. Andrzej holds a master's degree in Electronics from Warsaw Technical University, speaks four languages and programs in many, many more. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Total count of records
Hi, When Solr retrives records based on a input match , it gives total count of records. Say for Ex , it displays like : 1 out of 20,000 for the particular search string. How the total count of records are fetched in Solr , does it refer any Schema or XML file?. Regards Bhaskar
Re: Total count of records
How the total count of records are fetched in Solr , does it refer any Schema or XML file?. Sorry, but I did not get you. What does that mean? The total count is not stored anywhere; it is computed based on how many documents you have in your index matching the query. Cheers Avlesh On Thu, Aug 27, 2009 at 7:36 AM, bhaskar chandrasekar bas_s...@yahoo.co.inwrote: Hi, When Solr retrives records based on a input match , it gives total count of records. Say for Ex , it displays like : 1 out of 20,000 for the particular search string. How the total count of records are fetched in Solr , does it refer any Schema or XML file?. Regards Bhaskar
RE: Lucene Search Performance Analysis Workshop
I am wondering... are new SOLR filtering features faster than standard Lucene queries like {query} AND {filter}??? Why can't we improve Lucene then? Fuad P.S. https://issues.apache.org/jira/browse/SOLR-1169 https://issues.apache.org/jira/browse/SOLR-1179 -Original Message- From: Erik Hatcher [mailto:erik.hatc...@gmail.com] Sent: August-26-09 8:50 PM To: solr-user@lucene.apache.org Subject: Fwd: Lucene Search Performance Analysis Workshop While Andrzej's talk will focus on things at the Lucene layer, I'm sure there'll be some great tips and tricks useful to Solrians too. Andrzej is one of the sharpest folks I've met, and he's also a very impressive presenter. Tune in if you can. Erik Begin forwarded message: From: Andrzej Bialecki a...@getopt.org Date: August 26, 2009 5:44:40 PM EDT To: java-u...@lucene.apache.org Subject: Lucene Search Performance Analysis Workshop Reply-To: java-u...@lucene.apache.org Hi all, I am giving a free talk/ workshop next week on how to analyze and improve Lucene search performance for native lucene apps. If you've ever been challenged to get your Java Lucene search apps running faster, I think you might find the talk of interest. Free online workshop: Thursday, September 3rd 2009 11:00-11:30AM PDT / 14:00-14:30 EDT Follow this link to sign up: http://www2.eventsvc.com/lucidimagination/event/ff97623d-3fd5-43ba-a69d-650d cb1d6bbc?trk=WR-SEP2009-AP About: Lucene Performance Workshop: Understanding Lucene Search Performance with Andrzej Bialecki Experienced Java developers know how to use the Apache Lucene library to build powerful search applications natively in Java. LucidGaze for Lucene from Lucid Imagination, just released this week, provides a powerful utility for making transparent the underlying indexing and search operations, and analyzing their impact on search performance. Agenda: * Understanding sources of variability in Lucene search performance * LucidGaze for Lucene APIs for performance statistics * Applying LucidGaze for Lucene performance statistics to real-world performance problems Join us for a free online workshop. Sign up via the link below: http://www2.eventsvc.com/lucidimagination/event/ff97623d-3fd5-43ba-a69d-650d cb1d6bbc?trk=WR-SEP2009-AP About the Presenter: Andrzej Bialecki, Apache Lucene PMC Member, is on the Lucid Imagination Technical Advisory Board; he also serves as the project lead for Nutch, and as committer in the Lucene-java, Nutch and Hadoop projects. He has broad expertise, across domains as diverse as information retrieval, systems architecture, embedded systems kernels, networking and business process/e-commerce modeling. He's also the author of the popular Luke index inspection utility. Andrzej holds a master's degree in Electronics from Warsaw Technical University, speaks four languages and programs in many, many more. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
RE: SolrJ and Solr web simultaneously?
Frankly, I never tried any DIH... probably it is the best option for this specific case (they have Java developer) - but one should be knowledgeable enough to design SOLR schema... And I noticed here (and also at HBase mailing list) many first-time users are still thinking in terms of Relational-DBMS and are trying to index as-is their tables with relations (and different PKs) instead of indexing their documents... I have constantly 1000+ docs per second now, with 5%-15% CPU... small docs 5Kb in size in average, 7 fields... yes, correct, 3M+ docs in an hour... could be 10 times more!!! (5%-15%CPU currently) Fuad With a relational database, the approach that has been working for us and many customers is to first give DataImportHandler a go. It's powerful and fast. 3M docs should index in about an hour or less, I'd speculate. But using DIH does require making access from Solr to the DB server solid, of course. Erik
Re: Cannot get solr 1.3.0 to run properly with plesk 9.2.1 on CentOS
Hey Guys, Ok, I found this: Troubleshooting Errors It's possible that you get an error related to the following: SEVERE: Exception starting filter SolrRequestFilter java.lang.NoClassDefFoundError: Could not initialize class org.apache.solr.core.SolrConfig at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:76) . Caused by: java.lang.RuntimeException: XPathFactory#newInstance() failed to create an XPathFactory for the default object model: http://java.sun.com/jaxp/xpath/dom with the XPathFactoryConfigurationException: javax.xml.x path.XPathFactoryConfigurationException: No XPathFctory implementation found for the object model: http://java.sun.com/jaxp/xpath/dom at javax.xml.xpath.XPathFactory.newInstance(Unknown Source) This is due to your tomcat instance not having the xalan jar file in the classpath. It took me some digging to find this, and thought it might be useful for others. The location varies from distribution to distribution, but I essentially just added (via a symlink) the jar file to the shared/lib directory under the tomcat directory. I am a java n00b. How can I set this up? On Tue, Aug 18, 2009 at 10:16 PM, Chris Hostetterhossman_luc...@fucit.org wrote: : -Dsolr.solr.home='/some/path' : : Should I be putting that somewhere? Or is that already taken care of : when I edited the web.xml file in my solr.war file? No ... you do not need to set that system property if you already have it working because of modifications to the web.xml ... according to the log you posted earlier, Solr is seeing your solr home dir set correctly... Aug 17, 2009 11:16:15 PM org.apache.solr.core.SolrResourceLoader locateInstanceDir INFO: Using JNDI solr.home: /usr/share/solr Aug 17, 2009 11:16:15 PM org.apache.solr.core.CoreContainer$Initializer initialize INFO: looking for solr.xml: /usr/share/solr/solr.xml Aug 17, 2009 11:16:15 PM org.apache.solr.core.SolrResourceLoader init INFO: Solr home set to '/usr/share/solr/' ...that's were you want it to point, correct? (don't be confused by the later message of Check solr/home property ... that's just a hint because 9 times out of 10 an error initializing solr comes from solr needing to *guess* about the solr home dir) The crux of your error is being able to load an XPathFactory, the fact that it can't load an XPath factory prevents the your classloader from even being able to load the SolrConfig class -- note this also in the log you posted earlier... java.lang.NoClassDefFoundError: Could not initialize class org.apache.solr.core.SolrConfig ...the root of the problem is here... Caused by: java.lang.RuntimeException: XPathFactory#newInstance() failed to create an XPathFactory for the default object model: http://java.sun.com/jaxp/xpath/dom with the XPathFactoryConfigurationException: javax.xml.xpath.XPathFactoryConfigurationException: No XPathFctory implementation found for the object model: http://java.sun.com/jaxp/xpath/dom at javax.xml.xpath.XPathFactory.newInstance(Unknown Source) at org.apache.solr.core.Config.clinit(Config.java:41) XPathFactory.newInstance() is used to construct an instance of an XPathFactory where the concrete type is unknown by the caller (in this case: solr) There is an alternte form (XPathFactory.newInstance(String uri)) which allows callers to specify *which* model they want, and it can throw an exception if the model isn't available in the current JVM using reflection, but if you read the javadocs for hte method being called... http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/xpath/XPathFactory.html#newInstance() Get a new XPathFactory instance using the default object model, DEFAULT_OBJECT_MODEL_URI, the W3C DOM. This method is functionally equivalent to: newInstance(DEFAULT_OBJECT_MODEL_URI) Since the implementation for the W3C DOM is always available, this method will never fail. ...except that in your case, it is in fact clearly failing. which suggests that your hosting provider has given you a crapy JVM. I have no good suggestions for debugging this, other then this google link... http://www.google.com/search?q=+No+XPathFctory+implementation+found+for+the+object+model%3A+http%3A%2F%2Fjava.sun.com%2Fjaxp%2Fxpath%2Fdom The good new is, there isn't anything solr specific about this problem. Any servlet container giving you that error when you load solr, should cause the exact same error with a servlet as simple as this... public class TestServlet extends javax.servlet.http.HttpServlet { public static Object X = javax.xml.xpath.XPathFactory.newInstance(); public void doGet (javax.servlet.http.HttpServletRequest req, javax.servlet.http.HttpServletResponse res) { // NOOP } } ...which should provide you with a nice short bug report for your hosting provider. One last important note (because it may burn you once you get the XPath problem
RE: Cannot get solr 1.3.0 to run properly with plesk 9.2.1 on CentOS
Looks like you totally ignored my previous post... Who is vendor of this openjdk-1.6.0.0? Who is vendor of JVM which this JDK runs on? ... such installs for Java are totally mess, you may have incompatible Servlet API loaded by bootstrap classloader before Tomcat classes First of all, please, try to install standard Java from SUN on your development box, and run some samples... !This is due to your tomcat instance not having the xalan jar file in !the classpath P.S. Don't rely on CentOS 'approved' Java libraries.
Re: master/slave replication issue
The log messages are shown when you hit the admin page. So on't worry about that. Keep a minimal configuration of Replication. All you need is masterUrl and pollInterval. On Thu, Aug 27, 2009 at 5:52 AM, J Gskinny_joe...@hotmail.com wrote: Hello, I'm having an issue getting the master to replicate its index to the slave. Below you will find my configuration settings. Here is what is happening: I can access the replication dashboard for both the slave and master and I can successfully execute HTTP commands against both of these urls through my browser. Now, my slave is configured to use the same URL as the one I am using in my browser when I query the master, yet when I do a tail -f tomcat home/logs/catalina.out on the slave server all I see is : Master - server1.xyz.com Aug 27, 2009 12:13:29 AM org.apache.solr.core.SolrCore execute INFO: [] webapp=null path=null params={command=details} status=0 QTime=8 Aug 27, 2009 12:13:32 AM org.apache.solr.core.SolrCore execute INFO: [] webapp=null path=null params={command=details} status=0 QTime=8 Aug 27, 2009 12:13:34 AM org.apache.solr.core.SolrCore execute INFO: [] webapp=null path=null params={command=details} status=0 QTime=4 Aug 27, 2009 12:13:36 AM org.apache.solr.core.SolrCore execute INFO: [] webapp=null path=null params={command=details} status=0 QTime=4 Aug 27, 2009 12:13:39 AM org.apache.solr.core.SolrCore execute INFO: [] webapp=null path=null params={command=details} status=0 QTime=4 Aug 27, 2009 12:13:42 AM org.apache.solr.core.SolrCore execute INFO: [] webapp=null path=null params={command=details} status=0 QTime=8 Aug 27, 2009 12:13:44 AM org.apache.solr.core.SolrCore execute INFO: [] webapp=null path=null params={command=details} status=0 QTime= For some reason, the webapp and the path is being set to null and I think this is affecting the replication?!? I am running Solr as the WAR file and it's 1.4 from a few weeks ago. requestHandler name=/replication class=solr.ReplicationHandler lst name=master !--Replicate on 'optimize'. Other values can be 'commit', 'startup'. It is possible to have multiple entries of this config string-- str name=replicateAfteroptimize/str !--Create a backup after 'optimize'. Other values can be 'commit', 'startup'. It is possible to have multiple entries of this config string .note that this is just for backup. Replication does not require this -- str name=backupAfteroptimize/str !--If configuration files need to be replicated give the names here, separated by comma -- !--str name=confFilesschema.xml,stopwords.txt,elevate.xml/str-- /lst /requestHandler Notice that I commented out the replication of the configuration files. I didn't think this is important for the attempt to try to get replication working. However, is it good to have these files replicated? Slave - server2.xyz.com requestHandler name=/replication class=solr.ReplicationHandler lst name=slave !--fully qualified url for the replication handler of master . It is possible to pass on this as a request param for the fetchindex command-- str name=masterUrlhttp://server1.xyz.com:8080/jdoe/replication/str !--Interval in which the slave should poll master .Format is HH:mm:ss . If this is absent slave does not poll automatically. But a fetchindex can be triggered from the admin or the http API -- str name=pollInterval00:00:20/str !-- THE FOLLOWING PARAMETERS ARE USUALLY NOT REQUIRED-- !--to use compression while transferring the index files. The possible values are internal|external if the value is 'external' make sure that your master Solr has the settings to honour the accept-encoding header. see here for details http://wiki.apache.org/solr/SolrHttpCompression If it is 'internal' everything will be taken care of automatically. USE THIS ONLY IF YOUR BANDWIDTH IS LOW . THIS CAN ACTUALLY SLOWDOWN REPLICATION IN A LAN-- str name=compressioninternal/str !--The following values are used when the slave connects to the master to download the index files. Default values implicitly set as 5000ms and 1ms respectively. The user DOES NOT need to specify these unless the bandwidth is extremely low or if there is an extremely high latency-- str name=httpConnTimeout5000/str str name=httpReadTimeout1/str !-- If HTTP Basic authentication is enabled on the master, then the slave can be configured with the following -- str name=httpBasicAuthUserusername/str str name=httpBasicAuthPasswordpassword/str /lst /requestHandler Thanks for your help! _ Hotmail® is up to 70% faster. Now good news travels really fast.
Re: Max limit on number of cores?
There is no hard limit. It is going to be decided by your h/w . You will be limited by the no:of files that can be kept open by your system. On Thu, Aug 27, 2009 at 1:06 AM, djain101dharmveer_j...@yahoo.com wrote: Hi, Is there any maximum limit on the number of cores one solr webapp can have without compromising on its performance? If yes, what is that limit? Thanks, Dharmveer -- View this message in context: http://www.nabble.com/Max-limit-on-number-of-cores--tp25155334p25155334.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Principal Engineer| AOL | http://aol.com