How to display solr search results in Json format
I have indexed all my database data in solr, now I want to rum search on it and display results in JSON. what i need to do for it. - Thanks Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-display-solr-search-results-in-Json-format-tp3004734p3004734.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to display solr search results in Json format
Hi Romi, When querying the Solr index, use 'wt=json' as part of your query string to get the results back in json format. On Tue, May 31, 2011 at 11:35 AM, Romi romijain3...@gmail.com wrote: I have indexed all my database data in solr, now I want to rum search on it and display results in JSON. what i need to do for it. - Thanks Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-display-solr-search-results-in-Json-format-tp3004734p3004734.html Sent from the Solr - User mailing list archive at Nabble.com. -- Thanks and Regards, DakshinaMurthy BM
Re: How to display solr search results in Json format
Thanks for reply, But i want to know how Json does it internally, I mean how it display results as Field:value. - Thanks Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-display-solr-search-results-in-Json-format-tp3004734p3004768.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: is replication eating up OldGen space
Some more info, after one week the servers have the following status: Master (indexing only) + looks good and has heap size of about 6g from 10g OldGen + has loaded meanwhile 2 times the index from scratch via DIH + has added new documents into existing index via DIH + has optimized and replicated + no full GC within one week Slave A (search only) Online - looks bad and has heap size of 9.5g from 10g OldGen + was replicated - several full GC Slave B (search only) Backup + looks good has heap size of 4 g from 10g OldGen + was replicated + no full GC within one week Conclusion: + DIH, processing, indexing, replication are fine - the search is crap and eats up OldGen heap which can't be cleaned up by full GC. May be memory leaks or what ever... Due to this Solr 3.1 can _NOT_ be recommended as high-availability, high-search-load search engine because of unclear heap problems caused by the search. The search is out of the box, so no self produced programming errors. Any tools available for JAVA to analyze this? (like valgrind or electric fence for C++) Is it possible to analyze a heap dump produced with jvisualvm? Which tools? Bernd Am 30.05.2011 15:51, schrieb Bernd Fehling: Dear list, after switching from FAST to Solr I get the first _real_ data. This includes search times, memory consumption, perfomance of solr,... What I recognized so far is that something eats up my OldGen and I assume it might be replication. Current Data: one master - indexing only two slaves - search only over 28 million docs single instance single core index size 140g current heap size 16g After startup I have about 4g heap in use and about 3.5g of OldGen. After one week and some replications OldGen is filled close to 100 percent. If I start an optimize under this condition I get OOM of heap. So my assumption is that something is eating up my heap. Any idea how to trace this down? May be a memory leak somewhere? Best regards Bernd
Re: How to display solr search results in Json format
I am little confused about your question. Incase you are looking to access the json object returned by solr, decode the json object using a programming language of you choice. The document set can be accessed using $json['response']['docs'](in PHP). This is an array of hashes(associative arrays). Each element of this array is one document. You can iterate through this document and display the results as doc[fieldname]. But if you are looking for the internals of the JSON response writer, you can look at the JSONResponseWriter.java in package org.apache.solr.request. On Tue, May 31, 2011 at 11:52 AM, Romi romijain3...@gmail.com wrote: Thanks for reply, But i want to know how Json does it internally, I mean how it display results as Field:value. - Thanks Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-display-solr-search-results-in-Json-format-tp3004734p3004768.html Sent from the Solr - User mailing list archive at Nabble.com. -- Thanks and Regards, DakshinaMurthy
Re: DataImportHandler
Yes, actually the problem was this. i'm working with a maven project and i added the dependency dependency-- !--/dependency in my pom.xml. Maven loaded this jar and something must be wrong. I decided to eliminate this dependency and add the jar in the folder I pointed at solrconfig.xml file. When i run solr again TACHN!! everything went correctly. -- View this message in context: http://lucene.472066.n3.nabble.com/DataImportHandler-tp3001957p3005055.html Sent from the Solr - User mailing list archive at Nabble.com.
how to index pdf in solr/browse
hello,.. i'm new in solr, I now have to install apache solr 3.1.0 in ubuntu 10:04, but that so the problem is on solr/browse, the velocity/hit.vm already I change the settings to be: div class=result-document #foreach($fieldname in $doc.fieldNames) p span class=field-name$fieldname :/span span #foreach($value in $doc.getFieldValues($fieldname)) $value #end /span /p #end #if($params.getBool(debugQuery,false)) a href=# onclick='jQuery(this).siblings(pre).toggle(); return false;'toggle explain/a pre style=display:none$response.getExplainMap().get($doc.getFirstValue('id'))/pre #end /div I changed hit.vm to be able to view pdf files on solr / browse. how do I search for when in the click, continue to appear highlight and make the download button? please help, thank you very much greeting, Reynaldi
CLOSE_WAIT after connecting to multiple shards from a primary shard
Hi, We are having a primary Solr shard, and multiple secondary shards. We query data from the secondary shards by specifying the shards param in the query params. But we found that after recieving the data, there are large number of CLOSE_WAIT on the secondary shards from the primary shards. Like for e.g. tcp 1 0 primaryshardhost:56109 secondaryshardhost1:8090 CLOSE_WAIT tcp 1 0 primaryshardhost:51049 secondaryshardhost1:8090 CLOSE_WAIT tcp 1 0 primaryshardhost:49537 secondaryshardhost1:8089 CLOSE_WAIT tcp 1 0 primaryshardhost:44109 secondaryshardhost2:8090 CLOSE_WAIT tcp 1 0 primaryshardhost:32041 secondaryshardhost2:8090 CLOSE_WAIT tcp 1 0 primaryshardhost:48533 secondaryshardhost2:8089 CLOSE_WAIT We even changed the code to open the Solr connections as below.. SimpleHttpConnectionManager cm = new SimpleHttpConnectionManager(true); cm.closeIdleConnections(0L); HttpClient httpClient = new HttpClient(cm); solrServer = new CommonsHttpSolrServer(url,httpClient); solrServer.optimize(); But still we see these issues. Any ideas? Does Solr persist the connections to the secondary shards? -- Thanks, Mukunda
DIH: Exception with Too many connections
Hi all, I'm using DIH and getting the following error. My Solr version is Solr3.1. = ... Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: Could not create connection to database server. Attempted reconnect 3 times. Giving up. at sun.reflect.GeneratedConstructorAccessor98.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at com.mysql.jdbc.Util.handleNewInstance(Util.java:406) at com.mysql.jdbc.Util.getInstance(Util.java:381) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:985) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:956) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:926) at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2364) at com.mysql.jdbc.ConnectionImpl.init(ConnectionImpl.java:781) at com.mysql.jdbc.JDBC4Connection.init(JDBC4Connection.java:46) at sun.reflect.GeneratedConstructorAccessor94.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at com.mysql.jdbc.Util.handleNewInstance(Util.java:406) at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:352) at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:284) at org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:161) at org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:128) at org.apache.solr.handler.dataimport.JdbcDataSource.getConnection(JdbcDataSource.java:363) at org.apache.solr.handler.dataimport.JdbcDataSource.access$200(JdbcDataSource.java:39) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:240) ... 11 more Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: Data source rejected establishment of connection, message from server: Too many connections at sun.reflect.GeneratedConstructorAccessor98.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at com.mysql.jdbc.Util.handleNewInstance(Util.java:406) at com.mysql.jdbc.Util.getInstance(Util.java:381) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:985) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:956) at com.mysql.jdbc.MysqlIO.doHandshake(MysqlIO.java:1104) at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2292) ... 24 more = My dataSource setting is something like this: dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://database01/test?autoReconnect=true user=xxx password=xxx batchSize=-1 / Any idea to solve this problem? Thank you! -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-Exception-with-Too-many-connections-tp3005213p3005213.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH: Exception with Too many connections
looks like you are not being able to connect to database , pls see if you get similar exception when you try to connect from other clients On Tue, May 31, 2011 at 3:01 PM, tiffany tiffany.c...@future.co.jp wrote: Hi all, I'm using DIH and getting the following error. My Solr version is Solr3.1. = ... Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: Could not create connection to database server. Attempted reconnect 3 times. Giving up. at sun.reflect.GeneratedConstructorAccessor98.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at com.mysql.jdbc.Util.handleNewInstance(Util.java:406) at com.mysql.jdbc.Util.getInstance(Util.java:381) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:985) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:956) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:926) at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2364) at com.mysql.jdbc.ConnectionImpl.init(ConnectionImpl.java:781) at com.mysql.jdbc.JDBC4Connection.init(JDBC4Connection.java:46) at sun.reflect.GeneratedConstructorAccessor94.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at com.mysql.jdbc.Util.handleNewInstance(Util.java:406) at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:352) at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:284) at org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:161) at org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:128) at org.apache.solr.handler.dataimport.JdbcDataSource.getConnection(JdbcDataSource.java:363) at org.apache.solr.handler.dataimport.JdbcDataSource.access$200(JdbcDataSource.java:39) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:240) ... 11 more Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: Data source rejected establishment of connection, message from server: Too many connections at sun.reflect.GeneratedConstructorAccessor98.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at com.mysql.jdbc.Util.handleNewInstance(Util.java:406) at com.mysql.jdbc.Util.getInstance(Util.java:381) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:985) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:956) at com.mysql.jdbc.MysqlIO.doHandshake(MysqlIO.java:1104) at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2292) ... 24 more = My dataSource setting is something like this: dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://database01/test?autoReconnect=true user=xxx password=xxx batchSize=-1 / Any idea to solve this problem? Thank you! -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-Exception-with-Too-many-connections-tp3005213p3005213.html Sent from the Solr - User mailing list archive at Nabble.com. -- Chandan Tamrakar * *
Re: DIH: Exception with Too many connections
Tiffany, On Tue, May 31, 2011 at 11:16 AM, tiffany tiffany.c...@future.co.jp wrote: Any idea to solve this problem? in Addition to Chandan: Check your mysql process list and have a look what is displayed there Regards Stefan
Re: DIH: Exception with Too many connections
Thanks for your reply, Chandan. Here is the additional information. I'm also using the multi-core function, and I run the delta-import command in parallel due to saving the running time. If I don't run in parallel, it works fine. Each core accesses to the same database server but different schema. So, I don't know if I should change something in my database server side or I can adjust something at the Solr side by adding some kind of property. Tiffany -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-Exception-with-Too-many-connections-tp3005213p3005313.html Sent from the Solr - User mailing list archive at Nabble.com.
Getting and viewing a heap dump
Hi Bernd, I'm assuming Linux here, if you're running something else these instructions might differ slightly. First get a heap dump with: jmap -heap:format=b,file=/path/to/generate/heapdumpfile.hprof 1234 with 1234 being the PID (process id) of the JVM After you get a Heap dump you can analyze it with Eclipse MAT (Memory Analyzer Tool). Just a heads up if you're doing this in production: the JVM will freeze completely while generating the heap dump, which will seem like a giant stop the world GC with a 10GB heap. Good luck with finding out what's eating your memory! Constantijn P.S. Sorry about altering the subject line, but the spam assassin used by the mailing list was rejecting my post because it had replication in the subject line. hope it doesn't mess up the thread. On Tue, May 31, 2011 at 8:43 AM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: Some more info, after one week the servers have the following status: Master (indexing only) + looks good and has heap size of about 6g from 10g OldGen + has loaded meanwhile 2 times the index from scratch via DIH + has added new documents into existing index via DIH + has optimized and replicated + no full GC within one week Slave A (search only) Online - looks bad and has heap size of 9.5g from 10g OldGen + was replicated - several full GC Slave B (search only) Backup + looks good has heap size of 4 g from 10g OldGen + was replicated + no full GC within one week Conclusion: + DIH, processing, indexing, replication are fine - the search is crap and eats up OldGen heap which can't be cleaned up by full GC. May be memory leaks or what ever... Due to this Solr 3.1 can _NOT_ be recommended as high-availability, high-search-load search engine because of unclear heap problems caused by the search. The search is out of the box, so no self produced programming errors. Any tools available for JAVA to analyze this? (like valgrind or electric fence for C++) Is it possible to analyze a heap dump produced with jvisualvm? Which tools? Bernd Am 30.05.2011 15:51, schrieb Bernd Fehling: Dear list, after switching from FAST to Solr I get the first _real_ data. This includes search times, memory consumption, perfomance of solr,... What I recognized so far is that something eats up my OldGen and I assume it might be replication. Current Data: one master - indexing only two slaves - search only over 28 million docs single instance single core index size 140g current heap size 16g After startup I have about 4g heap in use and about 3.5g of OldGen. After one week and some replications OldGen is filled close to 100 percent. If I start an optimize under this condition I get OOM of heap. So my assumption is that something is eating up my heap. Any idea how to trace this down? May be a memory leak somewhere? Best regards Bernd
Re: DIH: Exception with Too many connections
Thanks Stefan! I executed the SHOW PROCESSLIST; command. (Is it what you mean? I've never tried it before...) It seems that when I executed one delta-import command, several threads were inserted into the table and removed after commit. Also it looks like the number of threads are pretty much equal to the number of entity in my db-data-config.xml. So, if the number of threads in the process list is larger than max_connections, I would get the too many connections error. Am I thinking the right way? If it is right, maybe I should think of the commit timing, changing the number of max_connections, and/or some other ways... If there are any other idea, please let me know =) Thanks a lot! -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-Exception-with-Too-many-connections-tp3005213p3005401.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Getting and viewing a heap dump
Hi Constantijn, yes I use Linux 64bit and thanks for the help. Bernd Am 31.05.2011 12:22, schrieb Constantijn Visinescu: Hi Bernd, I'm assuming Linux here, if you're running something else these instructions might differ slightly. First get a heap dump with: jmap -heap:format=b,file=/path/to/generate/heapdumpfile.hprof 1234 with 1234 being the PID (process id) of the JVM After you get a Heap dump you can analyze it with Eclipse MAT (Memory Analyzer Tool). Just a heads up if you're doing this in production: the JVM will freeze completely while generating the heap dump, which will seem like a giant stop the world GC with a 10GB heap. Good luck with finding out what's eating your memory! Constantijn P.S. Sorry about altering the subject line, but the spam assassin used by the mailing list was rejecting my post because it had replication in the subject line. hope it doesn't mess up the thread. On Tue, May 31, 2011 at 8:43 AM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: Some more info, after one week the servers have the following status: Master (indexing only) + looks good and has heap size of about 6g from 10g OldGen + has loaded meanwhile 2 times the index from scratch via DIH + has added new documents into existing index via DIH + has optimized and replicated + no full GC within one week Slave A (search only) Online - looks bad and has heap size of 9.5g from 10g OldGen + was replicated - several full GC Slave B (search only) Backup + looks good has heap size of 4 g from 10g OldGen + was replicated + no full GC within one week Conclusion: + DIH, processing, indexing, replication are fine - the search is crap and eats up OldGen heap which can't be cleaned up by full GC. May be memory leaks or what ever... Due to this Solr 3.1 can _NOT_ be recommended as high-availability, high-search-load search engine because of unclear heap problems caused by the search. The search is out of the box, so no self produced programming errors. Any tools available for JAVA to analyze this? (like valgrind or electric fence for C++) Is it possible to analyze a heap dump produced with jvisualvm? Which tools? Bernd Am 30.05.2011 15:51, schrieb Bernd Fehling: Dear list, after switching from FAST to Solr I get the first _real_ data. This includes search times, memory consumption, perfomance of solr,... What I recognized so far is that something eats up my OldGen and I assume it might be replication. Current Data: one master - indexing only two slaves - search only over 28 million docs single instance single core index size 140g current heap size 16g After startup I have about 4g heap in use and about 3.5g of OldGen. After one week and some replications OldGen is filled close to 100 percent. If I start an optimize under this condition I get OOM of heap. So my assumption is that something is eating up my heap. Any idea how to trace this down? May be a memory leak somewhere? Best regards Bernd -- * Bernd FehlingUniversitätsbibliothek Bielefeld Dipl.-Inform. (FH)Universitätsstr. 25 Tel. +49 521 106-4060 Fax. +49 521 106-4052 bernd.fehl...@uni-bielefeld.de33615 Bielefeld BASE - Bielefeld Academic Search Engine - www.base-search.net *
RE: newbie question for DataImportHandler
In the op it's stated that the index was deleted. I'm guessing that means the physical files, /data/ quote populate the table with another million rows of data. I remove the index that solr previously create. I restart solr and go to the data import handler development console and do the full import again. endquote Is there a separate cache that could be causing the issue? I'm a newbie as well and it seems that if I delete the index there shouldn't be any vestige info left anywhere Thanks -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Sunday, May 29, 2011 9:00 PM To: solr-user@lucene.apache.org Subject: Re: newbie question for DataImportHandler This trips up a lot of folks. Sold just marks docs as deleted, the terms etc are left in the index until an optimize is performed, or the segments are merged. This latter isn't very predictable, so just do an optimize. The docs aren't returned as results though. Best Erick On May 24, 2011 10:22 PM, antoniosi antonio...@gmail.com wrote: Hi, I am new to Solr; apologize in advance if this is a stupid question. I have created a simple database, with only 1 table with 3 columns, id, name, and last_update fields. I populate the database with 1 million test rows. I run solr, go to the data import handler development console and do a full import. I use the Luke tool to look at the content of the lucene index. This all works fine so far. I remove all the 1 million rows from my table and populate the table with another million rows of data. I remove the index that solr previously create. I restart solr and go to the data import handler development console and do the full import again. I use the Luke tool to look at the content of the lucene index. However, I am seeing the old data in my new index. Doe Solr keeps a cached copy of the index somewhere? I hope I have described my problem clearly. Thanks in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/newbie-question-for-DataImportHandler-tp2982277p2982277.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH: Exception with Too many connections
Tiffany, On Tue, May 31, 2011 at 12:45 PM, tiffany tiffany.c...@future.co.jp wrote: I executed the SHOW PROCESSLIST; command. (Is it what you mean? I've never tried it before...) Exactly this, yes :) On Tue, May 31, 2011 at 12:45 PM, tiffany tiffany.c...@future.co.jp wrote: So, if the number of threads in the process list is larger than max_connections, I would get the too many connections error. Am I thinking the right way? Yepp, right On Tue, May 31, 2011 at 12:45 PM, tiffany tiffany.c...@future.co.jp wrote: If it is right, maybe I should think of the commit timing, changing the number of max_connections, and/or some other ways... You may lift the allowed Number of Connections for the MySQL-Server? Or, of course - if possible - tweak your SOLR-Settings, correct Regards Stefan
Re: DIH: Exception with Too many connections
Hi You might also check the 'max_user_connections' settings too if you have that set: # Maximum number of connections, and per user max_connections = 2048 max_user_connections = 2048 http://dev.mysql.com/doc/refman/5.5/en/server-system-variables.html Cheers François On May 31, 2011, at 7:39 AM, Stefan Matheis wrote: Tiffany, On Tue, May 31, 2011 at 12:45 PM, tiffany tiffany.c...@future.co.jp wrote: I executed the SHOW PROCESSLIST; command. (Is it what you mean? I've never tried it before...) Exactly this, yes :) On Tue, May 31, 2011 at 12:45 PM, tiffany tiffany.c...@future.co.jp wrote: So, if the number of threads in the process list is larger than max_connections, I would get the too many connections error. Am I thinking the right way? Yepp, right On Tue, May 31, 2011 at 12:45 PM, tiffany tiffany.c...@future.co.jp wrote: If it is right, maybe I should think of the commit timing, changing the number of max_connections, and/or some other ways... You may lift the allowed Number of Connections for the MySQL-Server? Or, of course - if possible - tweak your SOLR-Settings, correct Regards Stefan
Solr NRT
Hi, I have the following strange use case: Index 100 documents and make them immediately available for search. I call this on the fly indexing. Then the index can be removed. So the size of the index is not an issue here. Is this possible with Solr? Anyone tried something similar? Thank you, Ionut
RE: Solr NRT
Unless you cross a Solr server commit threshold your client has to post a commit/ message for the server content to be available for searching. Unfortunatly the Solr tool that is supposed to do this apparently doesn't. I asked for community help last week and was surprised to receive no response, I thought having to leave a Solr import process in an incomplete state would be more of a concern. In any case, our (hopefully temporary) solution was to hack the source code for the SimplePostTool demo code to turn it into a CommitTool. Once Solr receives the commit/ post you will be able to search for your recently added documents. -Original Message- From: Ionut Manta [mailto:ionut.ma...@gmail.com] Sent: Tuesday, May 31, 2011 7:41 AM To: solr-user@lucene.apache.org Subject: Solr NRT Hi, I have the following strange use case: Index 100 documents and make them immediately available for search. I call this on the fly indexing. Then the index can be removed. So the size of the index is not an issue here. Is this possible with Solr? Anyone tried something similar? Thank you, Ionut This e-mail and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error please notify the originator of the message. This footer also confirms that this e-mail message has been scanned for the presence of computer viruses. Any views expressed in this message are those of the individual sender, except where the sender specifies and with authority, states them to be the views of Iowa Student Loan.
RE: DIH: Exception with Too many connections
Hi, There is existing bug in DataImportHandler described (and patched) at https://issues.apache.org/jira/browse/SOLR-2233 It is not used in a thread safe manner, and it is not appropriately closed reopened (why?); and new connection is opened unpredictably. It may cause Too many connections even for huge SQL-side max_connections. If you are interested, I can continue work on SOLR-2233. CC: dev@lucene (is anyone working on DIH improvements?) Thanks, Fuad Efendi http://www.tokenizer.ca/ -Original Message- From: François Schiettecatte [mailto:fschietteca...@gmail.com] Sent: May-31-11 7:44 AM To: solr-user@lucene.apache.org Subject: Re: DIH: Exception with Too many connections Hi You might also check the 'max_user_connections' settings too if you have that set: # Maximum number of connections, and per user max_connections = 2048 max_user_connections = 2048 http://dev.mysql.com/doc/refman/5.5/en/server-system-variables.html Cheers François So, if the number of threads in the process list is larger than max_connections, I would get the too many connections error. Am I thinking the right way?
Re: Solr NRT
What results did you got with this hack? How long it takes since you start indexing some documents until you get a search result? Did you try NRT? On Tue, May 31, 2011 at 3:47 PM, David Hill dh...@studentloan.org wrote: Unless you cross a Solr server commit threshold your client has to post a commit/ message for the server content to be available for searching. Unfortunatly the Solr tool that is supposed to do this apparently doesn't. I asked for community help last week and was surprised to receive no response, I thought having to leave a Solr import process in an incomplete state would be more of a concern. In any case, our (hopefully temporary) solution was to hack the source code for the SimplePostTool demo code to turn it into a CommitTool. Once Solr receives the commit/ post you will be able to search for your recently added documents. -Original Message- From: Ionut Manta [mailto:ionut.ma...@gmail.com] Sent: Tuesday, May 31, 2011 7:41 AM To: solr-user@lucene.apache.org Subject: Solr NRT Hi, I have the following strange use case: Index 100 documents and make them immediately available for search. I call this on the fly indexing. Then the index can be removed. So the size of the index is not an issue here. Is this possible with Solr? Anyone tried something similar? Thank you, Ionut This e-mail and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error please notify the originator of the message. This footer also confirms that this e-mail message has been scanned for the presence of computer viruses. Any views expressed in this message are those of the individual sender, except where the sender specifies and with authority, states them to be the views of Iowa Student Loan.
Re: Bulk indexing, UpdateProcessor overwriteDupes and poor IO performances
Tanguy You might have tried this already but can you set overwritedupes to false and set the signiture key to be the id. That way solr will manage updates? from the wiki http://wiki.apache.org/solr/Deduplication !-- An example dedup update processor that creates the id field on the fly based on the hash code of some other fields. This example has overwriteDupes set to false since we are using the id field as the signatureField and Solr will maintain uniqueness based on that anyway. -- HTH Lee On 30 May 2011 08:32, Tanguy Moal tanguy.m...@gmail.com wrote: Hello, Sorry for re-posting this but it seems my message got lost in the mailing list's messages stream without hitting anyone's attention... =D Shortly, has anyone already experienced dramatic indexing slowdowns during large bulk imports with overwriteDupes turned on and a fairly high duplicates rate (around 4-8x) ? It seems to produce a lot of deletions, which in turn appear to make the merging of segments pretty slow, by fairly increasing the number of little reads operations occuring simultaneously with the regular large write operations of the merge. Added to the poor IO performances of a commodity SATA drive, indexing takes ages. I temporarily bypassed that limitation by disabling the overwriting of duplicates, but that changes the way I request the index, requiring me to turn on field collapsing at search time. Is this a known limitation ? Has anyone a few hints on how to optimize the handling of index time deduplication ? More details on my setup and the state of my understanding are in my previous message here-after. Thank you very much in advance. Regards, Tanguy On 05/25/11 15:35, Tanguy Moal wrote: Dear list, I'm posting here after some unsuccessful investigations. In my setup I push documents to Solr using the StreamingUpdateSolrServer. I'm sending a comfortable initial amount of documents (~250M) and wished to perform overwriting of duplicated documents at index time, during the update, taking advantage of the UpdateProcessorChain. At the beginning of the indexing stage, everything is quite fast; documents arrive at a rate of about 1000 doc/s. The only extra processing during the import is computation of a couple of hashes that are used to identify uniquely documents given their content, using both stock (MD5Signature) and custom (derived from Lookup3Signature) update processors. I send a commit command to the server every 500k documents sent. During a first period, the server is CPU bound. After a short while (~10 minutes), the rate at which documents are received starts to fall dramatically, the server being IO bound. I've been firstly thinking of a normal speed decrease during the commit, while my push client is waiting for the flush to occur. That would have been a normal slowdown. The thing that retained my attention was the fact that unexpectedly, the server was performing a lot of small reads, way more the number writes, which seem to be larger. The combination of the many small reads with the constant amount of bigger writes seem to be creating a lot of IO contention on my commodity SATA drive, and the ETA of my built index started to increase scarily =D I then restarted the JVM with JMX enabled so I could start investigating a little bit more. I've the realized that the UpdateHandler was performing many reads while processing the update request. Are there any known limitations around the UpdateProcessorChain, when overwriteDupes is set to true ? I turned that off, which of course breaks the intent of my built index, but for comparison purposes it's good. That did the trick, indexing is fast again, even with the periodic commits. I therefor have two questions, an interesting first one and a boring second one : 1 / What's the workflow of the UpdateProcessorChain when one or more processors have overwriting of duplicates turned on ? What happens under the hood ? I tried to answer that myself looking at DirectUpdateHandler2 and my understanding stopped at the following : - The document is added to the lucene IW - The duplicates are deleted from the lucene IW The dark magic I couldn't understand seems to occur around the idTerm and updateTerm things, in the addDoc method. The deletions seem to be buffered somewhere, I just didn't get it :-) I might be wrong since I didn't read the code more than that, but the point might be at how does solr handles deletions, which is something still unclear to me. In anyways, a lot of reads seem to occur for that precise task and it tends to produce a lot of IO, killing indexing performances when overwriteDupes is on. I don't even understand why so many read operations occur at this stage since my process had a comfortable amount of RAM (with Xms=Xmx=8GB), with only 4.5GB are used so far. Any help, recommandation or idea is welcome
Re: Spellcheck component not returned with numeric queries
File an issue: https://issues.apache.org/jira/browse/SOLR-2556 On Monday 30 May 2011 16:07:41 Markus Jelsma wrote: Hi, The spell check component's output is not written when sending queries that consist of numbers only. Clients depending on the availability of the spellcheck output need to check if the output is actually there. This is with a very recent Solr 3.x check out. Is this a feature or a bug? File an issue? Cheers, -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: Nutch Crawl error
This question would be better asked on the Nutch forum rather than the Solr forum. Best Erick On Thu, May 26, 2011 at 12:06 PM, Roger Shah rs...@caci.com wrote: I ran the command bin/nutch crawl urls -dir crawl -depth 3 crawl.log When I viewed crawl.log I found some errors such as: Can't retrieve Tika parser for mime-typeapplication/x-shockwave-flash, and some other similar messages for other types such as application/xml, etc. Do I need to download Tika for these errors to go away? Where can I download Tika so that it can work with Nutch? If there are instructions to install Tika to work with Nutch please send them to me. Thanks, Roger
Re: Facet Query
I'm guessing you're faceting on an analyzed field. This is usually a bad idea. What is the use-case you're trying to solve? Best Erick On Fri, May 27, 2011 at 12:51 AM, Jasneet Sabharwal jasneet.sabhar...@ngicorporation.com wrote: Hi When I do a facet query on my data, it shows me a list of all the words present in my database with their count. Is it possible to not get the results of common words like a, an, the, http and so one but only get the count of stuff we need like microsoft, ipad, solr, etc. -- Thanx Regards Jasneet Sabharwal
Re: How to disable QueryElevationComponent
Let's back up a bit. Why don't you want a uniqueKey? It's usually a good idea to have one, especially if you're using DIH. Best Erick On Fri, May 27, 2011 at 2:53 AM, Romi romijain3...@gmail.com wrote: i removed searchComponent name=elevator class=org.apache.solr.handler.component.QueryElevationComponent str name=queryFieldTypestring/str str name=config-fileelevate.xml/str /searchComponent from solrconfig.xml but it is showing the following exception: java.lang.NullPointerException at org.apache.solr.handler.dataimport.DataImporter.identifyPk(DataImporter.java:152) at org.apache.solr.handler.dataimport.DataImporter.init(DataImporter.java:111) at org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:113) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:486) at org.apache.solr.core.SolrCore.init(SolrCore.java:588) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:428) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:278) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594) at org.mortbay.jetty.servlet.Context.startContext(Context.java:139) at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218) at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500) at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147) at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117) at org.mortbay.jetty.Server.doStart(Server.java:210) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.mortbay.start.Main.invokeMain(Main.java:183) at org.mortbay.start.Main.start(Main.java:497) at org.mortbay.start.Main.main(Main.java:115) - Thanks Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-disable-QueryElevationComponent-tp2992195p2992320.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Documents update
And it wouldn't work unless all the data is stored anyway. Currently there's no way to update a single field in a document, although there's work being done in that direction (see the column stride JIRA). What do you want to do with these fields? If it's to influence scoring, you could look at external fields. If the flags are a selection criteria, it's...harder. What are the flags used for? Could you consider essentially storing a map of the uniqueKey's and flags in a special document and having your app read that document and merge the results with the output? If this seems irrelevant, a more complete statement of the use-case would be helpful. Best Erick On Fri, May 27, 2011 at 4:33 AM, Denis Kuzmenok forward...@ukr.net wrote: I'm using 3.1 now. Indexing lasts for a few hours, and have big plain size. Getting all documents would be rather slow :( Not with 1.4, but apparently there is a patch for trunk. Not sure if it is in 3.1. If you are on 1.4, you could first query Solr to get the data for the document to be changed, change the modified values, and make a complete XML, including all fields, for post.jar. Regards, Gora
Re: DIH render html entities
Convert them to what? Individual fields in your docs? Text? If the former, you might get some joy from the XpathEntityProcessor. If you want to just strip the markup and index all the content you might get some joy from the various *html* analyzers listed here: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters Best Erick On Fri, May 27, 2011 at 5:19 AM, anass talby anass.ta...@gmail.com wrote: Sorry my question was not clear. when I get data from database, some field contains some html special chars, and what i want to do is just convert them automatically. On Fri, May 27, 2011 at 1:00 PM, Gora Mohanty g...@mimirtech.com wrote: On Fri, May 27, 2011 at 3:50 PM, anass talby anass.ta...@gmail.com wrote: Is there any way to render html entities in DIH for a specific field? [...] This does not make too much sense: What do you mean by rendering HTML entities. DIH just indexes, so where would it render HTML to, even if it could? Please take a look at http://wiki.apache.org/solr/UsingMailingLists Regards, Gora -- Anass
Re: Documents update
Flags are stored to filter results and it's pretty highloaded, it's working fine, but i can't update index very often just to make flags up to time =\ Where can i read about using external fields / files? And it wouldn't work unless all the data is stored anyway. Currently there's no way to update a single field in a document, although there's work being done in that direction (see the column stride JIRA). What do you want to do with these fields? If it's to influence scoring, you could look at external fields. If the flags are a selection criteria, it's...harder. What are the flags used for? Could you consider essentially storing a map of the uniqueKey's and flags in a special document and having your app read that document and merge the results with the output? If this seems irrelevant, a more complete statement of the use-case would be helpful. Best Erick
Re: Splitting fields
Hmmm, I wonder if a custom Transformer would help here? It can be inserted into a chain of transformers in DIH. Essentially, you subclass Transformer and implement one method (transformRow) and do anything you want. The input is a map of String, Object that is a simple representation of the Solr document. You can add/subtract/whatever you want to that map and then just return it. The map in transformRow has all the changes by any other entries in the transform chain at this point, and your changes are passed on to the next transformer in the chain. The only restriction I know of is that the document has to conform to the schema when all is said and done. Best Erick On Fri, May 27, 2011 at 6:47 AM, Joe Fitzgerald joe_fitzger...@oxfordcorp.com wrote: Hello, I am in an odd position. The application server I use has built-in integration with SOLR. Unfortunately, its native capabilities are fairly limited, specifically, it only supports a standard/pre-defined set of fields which can be indexed. As a result, it has left me kludging how I work with Solr and doing things like putting what I'd like to be multiple, separate fields into a single Solr field. As an example, I may put a customer id and name into a single field called 'custom1'. Ideally, I'd like this information to be returned in separate fields...and even better would be for them to be indexed as separate fields but I can live without the latter. Currently, I'm building out a json representation of this information which makes it easy for me to deal with when I extract the results...but it all feels wrong. I do have complete control over the actual Solr installation (just not the indexing call to Solr), so I was hoping there may be a way to configure Solr to take my single field and split it up into a different field for each key in my json representation. I don't see anything native to Solr that would do this for me but there are a few features that I thought sounded similar and was hoping to get some opinions on how I may be able to move forward with this... Poly fields, such as the spatial location, might help? Can I build my own poly-field that would split up the main field into subfields? Do poly-fields let me return the subfields? I don't quite have my head around polyfields yet. Another option although I suspect this won't be considered a good approach, but what about extending the copyField functionality of schema.xml to support my needs? It would seem not entirely unreasonable that copyField would provide a means to extract only a portion of the contents of the source field to place in the destination field, no? I'm sure people more familiar with Solr's architecture could explain why this isn't really an appropriate thing for Solr to handle (just because it could doesn't mean it should)... The other - and probably best -- option would be to leverage Solr directly, bypassing the native integration of my application server, which we've already done for most cases. I'd love to go this route but I'm having a hard time figuring out how to easily accomplish the same functionality provided by my app server integration...perhaps someone on the list could help me with this path forward? Here is what I'm trying to accomplish: I'm indexing documents (text, pdf, html...) but I need to include fields in the results of my searches which are only available from a db query. I know how to have Solr index results from a db query, but I'm having trouble getting it to index the documents that are associated to each record of that query (full path/filename is one of the fields of that query). I started to try to use the dataImport handler to do this, by setting up a FileDataSource in addition to my jdbc data source. I tried to leverage the filedatasource to populate a sub-entity based on the db field that contains the full path/filename, but I wasn't sure how to specify the db field from the root query/entity. Before I spent too much time, I also realized I wasn't sure how to get Solr to deal with binary file types this way either which upon further reading seemed like I would need to leverage Tika - can that be done within the confines of dataimporthandler? Any advice is greatly appreciated. Thanks in advance, Joe
Re: Edgengram
That'll work for your case, although be aware that string types aren't analyzed at all, so case matters, as do spaces etc. What is the use-case here? If you explain it a bit there might be better answers Best Erick On Fri, May 27, 2011 at 9:17 AM, Brian Lamb brian.l...@journalexperts.com wrote: For this, I ended up just changing it to string and using abcdefg* to match. That seems to work so far. Thanks, Brian Lamb On Wed, May 25, 2011 at 4:53 PM, Brian Lamb brian.l...@journalexperts.comwrote: Hi all, I'm running into some confusion with the way edgengram works. I have the field set up as: fieldType name=edgengram class=solr.TextField positionIncrementGap=1000 analyzer tokenizer class=solr.LowerCaseTokenizerFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=100 side=front / /analyzer /fieldType I've also set up my own similarity class that returns 1 as the idf score. What I've found this does is if I match a string abcdefg against a field containing abcdefghijklmnop, then the idf will score that as a 7: 7.0 = idf(myfield: a=51 ab=23 abc=2 abcd=2 abcde=2 abcdef=2 abcdefg=2) I get why that's happening, but is there a way to avoid that? Do I need to do a new field type to achieve the desired affect? Thanks, Brian Lamb
Re: applying FastVectorHighlighter truncation patch to solr 3.1
Did you try to apply the patch in Lucene's contrib? On Tuesday 17 May 2011 18:55:49 Paul wrote: I'm having this issue with solr 3.1: https://issues.apache.org/jira/browse/LUCENE-1824 It looks like there is a patch offered, but I can't figure out how to apply it. What is the easiest way for me to get this fix? I'm just using the example solr with changed conf xml files. Is there a file somewhere I can just drop in? -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: Pivot with Stats (or Stats with Pivot)
Well, you can discuss it on the dev list, but given that Solr is open source, you'll have to either create your own patch or engage the community to create one. You haven't really stated why this is a good thing to have, just that you want it. So the use-case would be a big help. Please don't raise a JIRA until you've discussed it though, it may be that there's something in the works that one of the devs already knows about... Best Erick On Fri, May 27, 2011 at 10:34 AM, edua...@calandra.com.br wrote: Nobody? Please, help edua...@calandra.com.br 17/05/2011 16:13 Please respond to solr-user@lucene.apache.org To solr-user@lucene.apache.org cc Subject Pivot with Stats (or Stats with Pivot) Hi All, Is it possible to get stats (like Stats Component: min ,max, sum, count, missing, sumOfSquares, mean and stddev) from numeric fields inside hierarchical facets (with more than one level, like Pivot)? I would like to query: ...?q=*:*version=2.2start=0rows=0stats=truestats.field=numeric_field1stats.field=numeric_field2stats.pivot=field_x,field_y,field_z and get min, max, sum, count, etc. from numeric_field1 and numeric_field2 from all combinations of field_x, field_y and field_z (hierarchical values). Using stats.facet I get just one field at one level and using facet.pivot I get just counts, but no stats. Looping in client application to do all combinations of facets values will be to slow because there is a lot of combinations. Thanks a lot!
Re: Documents update
http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html On Tuesday 31 May 2011 15:41:32 Denis Kuzmenok wrote: Flags are stored to filter results and it's pretty highloaded, it's working fine, but i can't update index very often just to make flags up to time =\ Where can i read about using external fields / files? And it wouldn't work unless all the data is stored anyway. Currently there's no way to update a single field in a document, although there's work being done in that direction (see the column stride JIRA). What do you want to do with these fields? If it's to influence scoring, you could look at external fields. If the flags are a selection criteria, it's...harder. What are the flags used for? Could you consider essentially storing a map of the uniqueKey's and flags in a special document and having your app read that document and merge the results with the output? If this seems irrelevant, a more complete statement of the use-case would be helpful. Best Erick -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: solr Invalid Date in Date Math String/Invalid Date String
Can we see the results of attaching debugQuery=on to the query? That often points out the issue. I'd expect this form to work: [2006-12-22T00:00:00Z TO 2006-12-22T23:59:59Z] Best Erick 2011/5/27 Ellery Leung elleryle...@be-o.com: Thank you Mike. So I understand that now. But what about the other items that have values on both size? They don't work at all. -Original Message- From: Mike Sokolov [mailto:soko...@ifactory.com] Sent: 2011年5月27日 10:23 下午 To: solr-user@lucene.apache.org Cc: alucard001 Subject: Re: solr Invalid Date in Date Math String/Invalid Date String The * endpoint for range terms wasn't implemented yet in 1.4.1 As a workaround, we use very large and very small values. -Mike On 05/27/2011 12:55 AM, alucard001 wrote: Hi all I am using SOLR 1.4.1 (according to solr info), but no matter what date field I use (date or tdate) defined in default schema.xml, I cannot do a search in solr-admin analysis.jsp: fieldtype: date(or tdate) fieldvalue(index): 2006-12-22T13:52:13Z (I type it in manually, no trailing space) fieldvalue(query): The only success case: 2006-12-22T13:52:13Z All search below are failed: * TO NOW [* TO NOW] 2006-12-22T00:00:00Z TO 2006-12-22T23:59:59Z 2006\-12\-22T00\:00\:00Z TO 2006\-12\-22T23\:59\:59Z [2006-12-22T00:00:00Z TO 2006-12-22T23:59:59Z] [2006\-12\-22T00\:00\:00Z TO 2006\-12\-22T23\:59\:59Z] 2006-12-22T00:00:00.000Z TO 2006-12-22T23:59:59.999Z 2006\-12\-22T00\:00\:00\.000Z TO 2006\-12\-22T23\:59\:59\.999Z [2006-12-22T00:00:00.000Z TO 2006-12-22T23:59:59.999Z] [2006\-12\-22T00\:00\:00\.000Z TO 2006\-12\-22T23\:59\:59\.999Z] 2006-12-22T00:00:00Z TO * 2006\-12\-22T00\:00\:00Z TO * [2006-12-22T00:00:00Z TO *] [2006\-12\-22T00\:00\:00Z TO *] 2006-12-22T00:00:00.000Z TO * 2006\-12\-22T00\:00\:00\.000Z TO * [2006-12-22T00:00:00.000Z TO *] [2006\-12\-22T00\:00\:00\.000Z TO *] (vice versa) I get either: Invalid Date in Date Math String or Invalid Date String error What's wrong with it? Can anyone please help me on that? Thank you. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-Invalid-Date-in-Date-Math-String-Inv alid-Date-String-tp2991763p2991763.html Sent from the Solr - User mailing list archive at Nabble.com.
WIKI alerts
Anyone noticed that it doesn't work? Already 2 weeks https://issues.apache.org/jira/browse/INFRA-3667 I don't receive WIKI change notifications. I CC to 'Apache Wiki' wikidi...@apache.org Something is bad. -Fuad
Re: Documents update
Will it be slow if there are 3-5 million key/value rows? http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html On Tuesday 31 May 2011 15:41:32 Denis Kuzmenok wrote: Flags are stored to filter results and it's pretty highloaded, it's working fine, but i can't update index very often just to make flags up to time =\ Where can i read about using external fields / files?
Re: Match in the process of filter, not end, does it mean not matching?
Take a closer look at the results of KeywordTokenizerFactory. It won't break up the text into any tokens, the entire input is considered a single string. Are you sure this is what you intend? I'd start by removing most of your filters, understanding what's happening at each step then adding them back in again. For instance, it's unusual (but possibly correct) to use both the MappingCharFilterFactory and ISOLatin... factory. And I'm not even sure what all the *gram* filters are doing in a KeywordTokenized field.. Best Erick On Sun, May 29, 2011 at 8:39 PM, Ellery Leung elleryle...@be-o.com wrote: This is the schema: fieldType name=textContains class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.MappingCharFilterFactory mapping=../../filters/filter-mappings.txt/ charFilter class=solr.HTMLStripCharFilterFactory / tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.TrimFilterFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.CommonGramsFilterFactory words=../../filters/stopwords.txt ignoreCase=true/ filter class=solr.ShingleFilterFactory minShingleSize=2 maxShingleSize=30/ filter class=solr.NGramFilterFactory minGramSize=2 maxGramSize=30/ filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer analyzer type=query charFilter class=solr.MappingCharFilterFactory mapping=../../filters/filter-mappings.txt/ charFilter class=solr.HTMLStripCharFilterFactory / tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.TrimFilterFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer /fieldType And there is a multiValued field: field name=textContains_Something type=textContains multiValued=true indexed=true stored=true / Now I want to search this string: Merry Christmas and Happy New Year In Admin Analysis in solr admin, it highlight (in light blue) the matching word in LowerCaseFilterFactory, CommonGramsFilterFactory and ShingleFilterFactory. However, it does not have any highlight in NGramFilterFactory. Now, I did a search in full-interface mode in solr admin: textContains_Something:Merry Christmas and Happy New Year It contains NO RESULT. Does it mean that matching only counts after all tokenizer and filters? Thank you in advance for any help.
Re: collapse component with pivot faceting
Please provide a more detailed request. This is so general that it's hard to respond. What is the use-case you're trying to understand/implement? You might review: http://wiki.apache.org/solr/UsingMailingLists Best Erick On Mon, May 30, 2011 at 4:31 AM, Isha Garg isha.g...@orkash.com wrote: Hi All! Can anyone tell me how pivot faceting works in combination with field collapsing.? Please guide me in this respect. Thanks! Isha Garg
Re: Solr Dismax bf bq vs. q:{boost ...}
First, please define what wrong results means, what are you expecting and what are you seeing? Second, please post the results of debugQuery=on where we can all see it, perhaps something will pop out... Best Erick On Mon, May 30, 2011 at 12:27 PM, chazzuka chazz...@gmail.com wrote: I tried to do this: #1. search phrases in title^3 text^1 #2. based on result #1 add boost for field closed:0^2 #3. based on result in #2 boost based on last_modified and i tried like these: /solr/select ?q={!boost b=$dateboost v=$qq defType=dismax} dateboost=recip(ms(NOW/HOUR,modified),8640,2,1) qq=video qf=title^3+text pf=title^3+text bq=closed:0^2 debugQuery=true then i tried differently by changing solrconfig like these: str name=qftitle^3 text/str str name=pftitle^3 text/str str name=bfrecip(ms(NOW/HOUR,modified),8640,2,1)/str str name=bqclosed:0^2/str with query: /solr/select ?q=video debugQuery=true both seems give wrong results, anyone have an idea about doing those tasks? thanks in advanced -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Dismax-bf-bq-vs-q-boost-tp3003028p3003028.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Getting and viewing a heap dump
Constantjin: I've had better luck by sending messages as plain text. The Spam filter on the user list sometimes acts up if you send mail in richtext or similar formats. Gmail has a link to change this, what client are you using? And thanks for participating! Best Erick On Tue, May 31, 2011 at 3:22 AM, Constantijn Visinescu baeli...@gmail.com wrote: Hi Bernd, I'm assuming Linux here, if you're running something else these instructions might differ slightly. First get a heap dump with: jmap -heap:format=b,file=/path/to/generate/heapdumpfile.hprof 1234 with 1234 being the PID (process id) of the JVM After you get a Heap dump you can analyze it with Eclipse MAT (Memory Analyzer Tool). Just a heads up if you're doing this in production: the JVM will freeze completely while generating the heap dump, which will seem like a giant stop the world GC with a 10GB heap. Good luck with finding out what's eating your memory! Constantijn P.S. Sorry about altering the subject line, but the spam assassin used by the mailing list was rejecting my post because it had replication in the subject line. hope it doesn't mess up the thread. On Tue, May 31, 2011 at 8:43 AM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: Some more info, after one week the servers have the following status: Master (indexing only) + looks good and has heap size of about 6g from 10g OldGen + has loaded meanwhile 2 times the index from scratch via DIH + has added new documents into existing index via DIH + has optimized and replicated + no full GC within one week Slave A (search only) Online - looks bad and has heap size of 9.5g from 10g OldGen + was replicated - several full GC Slave B (search only) Backup + looks good has heap size of 4 g from 10g OldGen + was replicated + no full GC within one week Conclusion: + DIH, processing, indexing, replication are fine - the search is crap and eats up OldGen heap which can't be cleaned up by full GC. May be memory leaks or what ever... Due to this Solr 3.1 can _NOT_ be recommended as high-availability, high-search-load search engine because of unclear heap problems caused by the search. The search is out of the box, so no self produced programming errors. Any tools available for JAVA to analyze this? (like valgrind or electric fence for C++) Is it possible to analyze a heap dump produced with jvisualvm? Which tools? Bernd Am 30.05.2011 15:51, schrieb Bernd Fehling: Dear list, after switching from FAST to Solr I get the first _real_ data. This includes search times, memory consumption, perfomance of solr,... What I recognized so far is that something eats up my OldGen and I assume it might be replication. Current Data: one master - indexing only two slaves - search only over 28 million docs single instance single core index size 140g current heap size 16g After startup I have about 4g heap in use and about 3.5g of OldGen. After one week and some replications OldGen is filled close to 100 percent. If I start an optimize under this condition I get OOM of heap. So my assumption is that something is eating up my heap. Any idea how to trace this down? May be a memory leak somewhere? Best regards Bernd
Re: how can i index data in different documents
document isn't a tag recognized in schema.xml. Please review: http://wiki.apache.org/solr/UsingMailingLists Best Erick On Thu, May 26, 2011 at 6:46 AM, Romi romijain3...@gmail.com wrote: Ensure that when you add your documents, their type value is effectively set to either table1 or table2. did you mean i set document name=d1 type=table1 in schema.xml??? but as far as i concern there can only be one document tag then what about the table2?? - Romi -- View this message in context: http://lucene.472066.n3.nabble.com/how-can-i-index-data-in-different-documents-tp2988621p2988789.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Getting and viewing a heap dump
I was sending using the gmail webbrowser client in plaintext. Spamassasin didn't seem to like 3 things (according to the error message i got back): - I use a free email adress - My email address (before the @) ends with a number - The email had the word replication in the subject line. No idea where rule #3 came from but it was the easiest to fix so that's what I changed ;) On Tue, May 31, 2011 at 4:21 PM, Erick Erickson erickerick...@gmail.com wrote: Constantjin: I've had better luck by sending messages as plain text. The Spam filter on the user list sometimes acts up if you send mail in richtext or similar formats. Gmail has a link to change this, what client are you using? And thanks for participating! Best Erick On Tue, May 31, 2011 at 3:22 AM, Constantijn Visinescu baeli...@gmail.com wrote: Hi Bernd, I'm assuming Linux here, if you're running something else these instructions might differ slightly. First get a heap dump with: jmap -heap:format=b,file=/path/to/generate/heapdumpfile.hprof 1234 with 1234 being the PID (process id) of the JVM After you get a Heap dump you can analyze it with Eclipse MAT (Memory Analyzer Tool). Just a heads up if you're doing this in production: the JVM will freeze completely while generating the heap dump, which will seem like a giant stop the world GC with a 10GB heap. Good luck with finding out what's eating your memory! Constantijn P.S. Sorry about altering the subject line, but the spam assassin used by the mailing list was rejecting my post because it had replication in the subject line. hope it doesn't mess up the thread. On Tue, May 31, 2011 at 8:43 AM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: Some more info, after one week the servers have the following status: Master (indexing only) + looks good and has heap size of about 6g from 10g OldGen + has loaded meanwhile 2 times the index from scratch via DIH + has added new documents into existing index via DIH + has optimized and replicated + no full GC within one week Slave A (search only) Online - looks bad and has heap size of 9.5g from 10g OldGen + was replicated - several full GC Slave B (search only) Backup + looks good has heap size of 4 g from 10g OldGen + was replicated + no full GC within one week Conclusion: + DIH, processing, indexing, replication are fine - the search is crap and eats up OldGen heap which can't be cleaned up by full GC. May be memory leaks or what ever... Due to this Solr 3.1 can _NOT_ be recommended as high-availability, high-search-load search engine because of unclear heap problems caused by the search. The search is out of the box, so no self produced programming errors. Any tools available for JAVA to analyze this? (like valgrind or electric fence for C++) Is it possible to analyze a heap dump produced with jvisualvm? Which tools? Bernd Am 30.05.2011 15:51, schrieb Bernd Fehling: Dear list, after switching from FAST to Solr I get the first _real_ data. This includes search times, memory consumption, perfomance of solr,... What I recognized so far is that something eats up my OldGen and I assume it might be replication. Current Data: one master - indexing only two slaves - search only over 28 million docs single instance single core index size 140g current heap size 16g After startup I have about 4g heap in use and about 3.5g of OldGen. After one week and some replications OldGen is filled close to 100 percent. If I start an optimize under this condition I get OOM of heap. So my assumption is that something is eating up my heap. Any idea how to trace this down? May be a memory leak somewhere? Best regards Bernd
Re: Solr NRT
Did you trying using Solr with RankingAlgorithm ? It supports NRT. You can index documents without a commit while searching concurrently. No changes are needed except for enabling NRT through solrconfig.xml. You can get information about the implementation from here: http://solr-ra.tgels.com/wiki/en/Near_Real_Time_Search http://solr-ra.tgels.com/papers/NRT_Solr_RankingAlgorithm.pdf You can download Solr with RankingAlgorithm from here: http://solr-ra.tgels.com Regards, - Nagendra Nagarajayya http://solr-ra.tgels.com http://rankingalgorithm.tgels.com On 5/31/2011 5:57 AM, Ionut Manta wrote: What results did you got with this hack? How long it takes since you start indexing some documents until you get a search result? Did you try NRT? On Tue, May 31, 2011 at 3:47 PM, David Hilldh...@studentloan.org wrote: Unless you cross a Solr server commit threshold your client has to post a commit/ message for the server content to be available for searching. Unfortunatly the Solr tool that is supposed to do this apparently doesn't. I asked for community help last week and was surprised to receive no response, I thought having to leave a Solr import process in an incomplete state would be more of a concern. In any case, our (hopefully temporary) solution was to hack the source code for the SimplePostTool demo code to turn it into a CommitTool. Once Solr receives thecommit/ post you will be able to search for your recently added documents. -Original Message- From: Ionut Manta [mailto:ionut.ma...@gmail.com] Sent: Tuesday, May 31, 2011 7:41 AM To: solr-user@lucene.apache.org Subject: Solr NRT Hi, I have the following strange use case: Index 100 documents and make them immediately available for search. I call this on the fly indexing. Then the index can be removed. So the size of the index is not an issue here. Is this possible with Solr? Anyone tried something similar? Thank you, Ionut This e-mail and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error please notify the originator of the message. This footer also confirms that this e-mail message has been scanned for the presence of computer viruses. Any views expressed in this message are those of the individual sender, except where the sender specifies and with authority, states them to be the views of Iowa Student Loan.
Re: Edgengram
In this particular case, I will be doing a solr search based on user preferences. So I will not be depending on the user to type abcdefg. That will be automatically generated based on user selections. The contents of the field do not contain spaces and since I am created the search parameters, case isn't important either. Thanks, Brian Lamb On Tue, May 31, 2011 at 9:44 AM, Erick Erickson erickerick...@gmail.comwrote: That'll work for your case, although be aware that string types aren't analyzed at all, so case matters, as do spaces etc. What is the use-case here? If you explain it a bit there might be better answers Best Erick On Fri, May 27, 2011 at 9:17 AM, Brian Lamb brian.l...@journalexperts.com wrote: For this, I ended up just changing it to string and using abcdefg* to match. That seems to work so far. Thanks, Brian Lamb On Wed, May 25, 2011 at 4:53 PM, Brian Lamb brian.l...@journalexperts.comwrote: Hi all, I'm running into some confusion with the way edgengram works. I have the field set up as: fieldType name=edgengram class=solr.TextField positionIncrementGap=1000 analyzer tokenizer class=solr.LowerCaseTokenizerFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=100 side=front / /analyzer /fieldType I've also set up my own similarity class that returns 1 as the idf score. What I've found this does is if I match a string abcdefg against a field containing abcdefghijklmnop, then the idf will score that as a 7: 7.0 = idf(myfield: a=51 ab=23 abc=2 abcd=2 abcde=2 abcdef=2 abcdefg=2) I get why that's happening, but is there a way to avoid that? Do I need to do a new field type to achieve the desired affect? Thanks, Brian Lamb
how does Solr/Lucene index multi-value fields
Hi. I want to store a list of documents (say each being 30-60k of text) into a single SolrDocument. (to speed up post-retrieval querying) In order to do this, I need to know if lucene calculates the TF/IDF score over the entire field or does it treat each value in the list as a unique field? If I can't store it as a multi-value, I could create a schema where I put each document into a unique field, but I'm not sure how to create the query to search all the fields. Regards Ian
Better Spellcheck
I've tried to use a spellcheck dictionary built from my own content, but my content ends up having a lot of misspelled words so the spellcheck ends up being less than effective. I could use a standard dictionary, but it may have problems with proper nouns. It also misses phrases. When someone searches for Untied States I would hope the spellcheck would suggest United States but it just recognizes that untied is a valid word and doesn't suggest any thing. Is there any way around this? Are there any third party modules or spellcheck systems that I could implement to get these type of features?
Re: Edgengram
Can you specify the analyzer you are using for your queries? May be you could use a KeywordAnalyzer for your queries so you don't end up matching parts of your query. http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ This should help you. On Tue, May 31, 2011 at 8:24 PM, Brian Lamb brian.l...@journalexperts.comwrote: In this particular case, I will be doing a solr search based on user preferences. So I will not be depending on the user to type abcdefg. That will be automatically generated based on user selections. The contents of the field do not contain spaces and since I am created the search parameters, case isn't important either. Thanks, Brian Lamb On Tue, May 31, 2011 at 9:44 AM, Erick Erickson erickerick...@gmail.com wrote: That'll work for your case, although be aware that string types aren't analyzed at all, so case matters, as do spaces etc. What is the use-case here? If you explain it a bit there might be better answers Best Erick On Fri, May 27, 2011 at 9:17 AM, Brian Lamb brian.l...@journalexperts.com wrote: For this, I ended up just changing it to string and using abcdefg* to match. That seems to work so far. Thanks, Brian Lamb On Wed, May 25, 2011 at 4:53 PM, Brian Lamb brian.l...@journalexperts.comwrote: Hi all, I'm running into some confusion with the way edgengram works. I have the field set up as: fieldType name=edgengram class=solr.TextField positionIncrementGap=1000 analyzer tokenizer class=solr.LowerCaseTokenizerFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=100 side=front / /analyzer /fieldType I've also set up my own similarity class that returns 1 as the idf score. What I've found this does is if I match a string abcdefg against a field containing abcdefghijklmnop, then the idf will score that as a 7: 7.0 = idf(myfield: a=51 ab=23 abc=2 abcd=2 abcde=2 abcdef=2 abcdefg=2) I get why that's happening, but is there a way to avoid that? Do I need to do a new field type to achieve the desired affect? Thanks, Brian Lamb -- Thanks and Regards, DakshinaMurthy BM
Re: Edgengram
fieldType name=edgengram class=solr.TextField positionIncrementGap=1000 analyzer tokenizer class=solr.LowerCaseTokenizerFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=25 side=front / /analyzer /fieldType I believe I used that link when I initially set up the field and it worked great (and I'm still using it in other places). In this particular example however it does not appear to be practical for me. I mentioned that I have a similarity class that returns 1 for the idf and in the case of an edgengram, it returns 1 * length of the search string. Thanks, Brian Lamb On Tue, May 31, 2011 at 11:34 AM, bmdakshinamur...@gmail.com bmdakshinamur...@gmail.com wrote: Can you specify the analyzer you are using for your queries? May be you could use a KeywordAnalyzer for your queries so you don't end up matching parts of your query. http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ This should help you. On Tue, May 31, 2011 at 8:24 PM, Brian Lamb brian.l...@journalexperts.comwrote: In this particular case, I will be doing a solr search based on user preferences. So I will not be depending on the user to type abcdefg. That will be automatically generated based on user selections. The contents of the field do not contain spaces and since I am created the search parameters, case isn't important either. Thanks, Brian Lamb On Tue, May 31, 2011 at 9:44 AM, Erick Erickson erickerick...@gmail.com wrote: That'll work for your case, although be aware that string types aren't analyzed at all, so case matters, as do spaces etc. What is the use-case here? If you explain it a bit there might be better answers Best Erick On Fri, May 27, 2011 at 9:17 AM, Brian Lamb brian.l...@journalexperts.com wrote: For this, I ended up just changing it to string and using abcdefg* to match. That seems to work so far. Thanks, Brian Lamb On Wed, May 25, 2011 at 4:53 PM, Brian Lamb brian.l...@journalexperts.comwrote: Hi all, I'm running into some confusion with the way edgengram works. I have the field set up as: fieldType name=edgengram class=solr.TextField positionIncrementGap=1000 analyzer tokenizer class=solr.LowerCaseTokenizerFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=100 side=front / /analyzer /fieldType I've also set up my own similarity class that returns 1 as the idf score. What I've found this does is if I match a string abcdefg against a field containing abcdefghijklmnop, then the idf will score that as a 7: 7.0 = idf(myfield: a=51 ab=23 abc=2 abcd=2 abcde=2 abcdef=2 abcdefg=2) I get why that's happening, but is there a way to avoid that? Do I need to do a new field type to achieve the desired affect? Thanks, Brian Lamb -- Thanks and Regards, DakshinaMurthy BM
Boosting fields at query time in Standard Request Handler from Solrconfig.xml
Hi, I am developing a search engine app using Asp.Net, C# and Solrnet. I use the standard request handler. Is there a way I can boost the fields at query time from inside the solrconfig.xml file itself. Just like the qf field for Dismax handler. Right now am searching like field1:value^1.5 field2:value^1.2 field3:value^0.8 and this is done in the middle tier. I want Solr itself to do this using standard request handler. Can I write a similar kind of thing inside standard req handler? Here is my solrconfig file. requestHandler name=standard class=solr.SearchHandler default=true lst name=defaults str name=echoParamsexplicit/str str name=hltrue/str str name=hl.snippets3/str str name=hl.fragsize25/str str name=qffile_description^100.0 file_content^6.0 file_name^10.0 file_comments^4.0 /str /lst arr name=last-components strspellcheck/str /arr /requestHandler But am not able to see the results if I add this in my solrconfig.xml file. I have edited the post to add my req handler code in solrconfig. But, if have my query string as file_description:result^1.0 file_content:result^0.6 file_name:result^0.5 file_comments:result^0.8, am able to see the required result. Regards Vignesh
Re: how does Solr/Lucene index multi-value fields
Can you explain the use-case a bit more here? Especially the post-query processing and how you expect the multiple documents to help here. But TF/IDF is calculated over all the values in the field. There's really no difference between a multi-valued field and storing all the data in a single field as far as relevance calculations are concerned. Best Erick On Tue, May 31, 2011 at 11:02 AM, Ian Holsman had...@holsman.net wrote: Hi. I want to store a list of documents (say each being 30-60k of text) into a single SolrDocument. (to speed up post-retrieval querying) In order to do this, I need to know if lucene calculates the TF/IDF score over the entire field or does it treat each value in the list as a unique field? If I can't store it as a multi-value, I could create a schema where I put each document into a unique field, but I'm not sure how to create the query to search all the fields. Regards Ian
Re: how does Solr/Lucene index multi-value fields
On May 31, 2011, at 12:11 PM, Erick Erickson wrote: Can you explain the use-case a bit more here? Especially the post-query processing and how you expect the multiple documents to help here. we have a collection of related stories. when a user searches for something, we might not want to display the story that is most-relevant (according to SOLR), but according to other home-grown rules. by combing all the possibilities in one SolrDocument, we can avoid a DB-hit to get related stories. But TF/IDF is calculated over all the values in the field. There's really no difference between a multi-valued field and storing all the data in a single field as far as relevance calculations are concerned. so.. it will suck regardless.. I thought we had per-field relevance in the current trunk. :-( Best Erick On Tue, May 31, 2011 at 11:02 AM, Ian Holsman had...@holsman.net wrote: Hi. I want to store a list of documents (say each being 30-60k of text) into a single SolrDocument. (to speed up post-retrieval querying) In order to do this, I need to know if lucene calculates the TF/IDF score over the entire field or does it treat each value in the list as a unique field? If I can't store it as a multi-value, I could create a schema where I put each document into a unique field, but I'm not sure how to create the query to search all the fields. Regards Ian
Re: Edgengram
Hi Brian, I don't know if I understand what you are trying to achieve. You want the term query abcdefg to have an idf of 1 insead of 7? I think using the KeywordTokenizerFilterFactory at query time should work. I would be something like: fieldType name=edgengram class=solr.TextField positionIncrementGap=1000 analyzer type=index tokenizer class=solr.LowerCaseTokenizerFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=25 side=front / /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory / /analyzer /fieldType this way, at query time abcdefg won't be turned to a ab abc abcd abcde abcdef abcdefg. At index time it will. Regards, Tomás On Tue, May 31, 2011 at 1:07 PM, Brian Lamb brian.l...@journalexperts.comwrote: fieldType name=edgengram class=solr.TextField positionIncrementGap=1000 analyzer tokenizer class=solr.LowerCaseTokenizerFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=25 side=front / /analyzer /fieldType I believe I used that link when I initially set up the field and it worked great (and I'm still using it in other places). In this particular example however it does not appear to be practical for me. I mentioned that I have a similarity class that returns 1 for the idf and in the case of an edgengram, it returns 1 * length of the search string. Thanks, Brian Lamb On Tue, May 31, 2011 at 11:34 AM, bmdakshinamur...@gmail.com bmdakshinamur...@gmail.com wrote: Can you specify the analyzer you are using for your queries? May be you could use a KeywordAnalyzer for your queries so you don't end up matching parts of your query. http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ This should help you. On Tue, May 31, 2011 at 8:24 PM, Brian Lamb brian.l...@journalexperts.comwrote: In this particular case, I will be doing a solr search based on user preferences. So I will not be depending on the user to type abcdefg. That will be automatically generated based on user selections. The contents of the field do not contain spaces and since I am created the search parameters, case isn't important either. Thanks, Brian Lamb On Tue, May 31, 2011 at 9:44 AM, Erick Erickson erickerick...@gmail.com wrote: That'll work for your case, although be aware that string types aren't analyzed at all, so case matters, as do spaces etc. What is the use-case here? If you explain it a bit there might be better answers Best Erick On Fri, May 27, 2011 at 9:17 AM, Brian Lamb brian.l...@journalexperts.com wrote: For this, I ended up just changing it to string and using abcdefg* to match. That seems to work so far. Thanks, Brian Lamb On Wed, May 25, 2011 at 4:53 PM, Brian Lamb brian.l...@journalexperts.comwrote: Hi all, I'm running into some confusion with the way edgengram works. I have the field set up as: fieldType name=edgengram class=solr.TextField positionIncrementGap=1000 analyzer tokenizer class=solr.LowerCaseTokenizerFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=100 side=front / /analyzer /fieldType I've also set up my own similarity class that returns 1 as the idf score. What I've found this does is if I match a string abcdefg against a field containing abcdefghijklmnop, then the idf will score that as a 7: 7.0 = idf(myfield: a=51 ab=23 abc=2 abcd=2 abcde=2 abcdef=2 abcdefg=2) I get why that's happening, but is there a way to avoid that? Do I need to do a new field type to achieve the desired affect? Thanks, Brian Lamb -- Thanks and Regards, DakshinaMurthy BM
Re: Edgengram
...or also use the LowerCaseTokenizerFactory at query time for consistency, but not the edge ngram filter. 2011/5/31 Tomás Fernández Löbbe tomasflo...@gmail.com Hi Brian, I don't know if I understand what you are trying to achieve. You want the term query abcdefg to have an idf of 1 insead of 7? I think using the KeywordTokenizerFilterFactory at query time should work. I would be something like: fieldType name=edgengram class=solr.TextField positionIncrementGap=1000 analyzer type=index tokenizer class=solr.LowerCaseTokenizerFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=25 side=front / /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory / /analyzer /fieldType this way, at query time abcdefg won't be turned to a ab abc abcd abcde abcdef abcdefg. At index time it will. Regards, Tomás On Tue, May 31, 2011 at 1:07 PM, Brian Lamb brian.l...@journalexperts.com wrote: fieldType name=edgengram class=solr.TextField positionIncrementGap=1000 analyzer tokenizer class=solr.LowerCaseTokenizerFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=25 side=front / /analyzer /fieldType I believe I used that link when I initially set up the field and it worked great (and I'm still using it in other places). In this particular example however it does not appear to be practical for me. I mentioned that I have a similarity class that returns 1 for the idf and in the case of an edgengram, it returns 1 * length of the search string. Thanks, Brian Lamb On Tue, May 31, 2011 at 11:34 AM, bmdakshinamur...@gmail.com bmdakshinamur...@gmail.com wrote: Can you specify the analyzer you are using for your queries? May be you could use a KeywordAnalyzer for your queries so you don't end up matching parts of your query. http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ This should help you. On Tue, May 31, 2011 at 8:24 PM, Brian Lamb brian.l...@journalexperts.comwrote: In this particular case, I will be doing a solr search based on user preferences. So I will not be depending on the user to type abcdefg. That will be automatically generated based on user selections. The contents of the field do not contain spaces and since I am created the search parameters, case isn't important either. Thanks, Brian Lamb On Tue, May 31, 2011 at 9:44 AM, Erick Erickson erickerick...@gmail.com wrote: That'll work for your case, although be aware that string types aren't analyzed at all, so case matters, as do spaces etc. What is the use-case here? If you explain it a bit there might be better answers Best Erick On Fri, May 27, 2011 at 9:17 AM, Brian Lamb brian.l...@journalexperts.com wrote: For this, I ended up just changing it to string and using abcdefg* to match. That seems to work so far. Thanks, Brian Lamb On Wed, May 25, 2011 at 4:53 PM, Brian Lamb brian.l...@journalexperts.comwrote: Hi all, I'm running into some confusion with the way edgengram works. I have the field set up as: fieldType name=edgengram class=solr.TextField positionIncrementGap=1000 analyzer tokenizer class=solr.LowerCaseTokenizerFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=100 side=front / /analyzer /fieldType I've also set up my own similarity class that returns 1 as the idf score. What I've found this does is if I match a string abcdefg against a field containing abcdefghijklmnop, then the idf will score that as a 7: 7.0 = idf(myfield: a=51 ab=23 abc=2 abcd=2 abcde=2 abcdef=2 abcdefg=2) I get why that's happening, but is there a way to avoid that? Do I need to do a new field type to achieve the desired affect? Thanks, Brian Lamb -- Thanks and Regards, DakshinaMurthy BM
Re: how does Solr/Lucene index multi-value fields
On 5/31/2011 12:16 PM, Ian Holsman wrote: we have a collection of related stories. when a user searches for something, we might not want to display the story that is most-relevant (according to SOLR), but according to other home-grown rules. by combing all the possibilities in one SolrDocument, we can avoid a DB-hit to get related stories. Avoiding a DB hit may or may not actually be a good goal here. You may find that hitting the DB to get related stories is _more performant_ than retrieving a very large stored field from Solr. (My sense is this can be especially a problem on a Solr index that has not been optimized, but I'm not sure). Sorry, don't have an answer to your actual question, but if an attempted performance improvement is making other things harder... might want to be sure your presumed performance improvement really is a performance improvement.
Re: how does Solr/Lucene index multi-value fields
Hmmm, I may have mis-lead you. Re-reading my text it wasn't very well written TF/IDF calculations are, indeed, per-field. I was trying to say that there was no difference between storing all the data for an individual field as a single long string of text in a single-valued field or as several shorter strings in a multi-valued field. Best Erick On Tue, May 31, 2011 at 12:16 PM, Ian Holsman had...@holsman.net wrote: On May 31, 2011, at 12:11 PM, Erick Erickson wrote: Can you explain the use-case a bit more here? Especially the post-query processing and how you expect the multiple documents to help here. we have a collection of related stories. when a user searches for something, we might not want to display the story that is most-relevant (according to SOLR), but according to other home-grown rules. by combing all the possibilities in one SolrDocument, we can avoid a DB-hit to get related stories. But TF/IDF is calculated over all the values in the field. There's really no difference between a multi-valued field and storing all the data in a single field as far as relevance calculations are concerned. so.. it will suck regardless.. I thought we had per-field relevance in the current trunk. :-( Best Erick On Tue, May 31, 2011 at 11:02 AM, Ian Holsman had...@holsman.net wrote: Hi. I want to store a list of documents (say each being 30-60k of text) into a single SolrDocument. (to speed up post-retrieval querying) In order to do this, I need to know if lucene calculates the TF/IDF score over the entire field or does it treat each value in the list as a unique field? If I can't store it as a multi-value, I could create a schema where I put each document into a unique field, but I'm not sure how to create the query to search all the fields. Regards Ian
Re: Custom Scoring relying on another server.
bump -- View this message in context: http://lucene.472066.n3.nabble.com/Custom-Scoring-relying-on-another-server-tp2994546p3006873.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how does Solr/Lucene index multi-value fields
Thanks Erick. sadly in my use-case I don't that wouldn't work. I'll go back to storing them at the story level, and hitting a DB to get related stories I think. --I On May 31, 2011, at 12:27 PM, Erick Erickson wrote: Hmmm, I may have mis-lead you. Re-reading my text it wasn't very well written TF/IDF calculations are, indeed, per-field. I was trying to say that there was no difference between storing all the data for an individual field as a single long string of text in a single-valued field or as several shorter strings in a multi-valued field. Best Erick On Tue, May 31, 2011 at 12:16 PM, Ian Holsman had...@holsman.net wrote: On May 31, 2011, at 12:11 PM, Erick Erickson wrote: Can you explain the use-case a bit more here? Especially the post-query processing and how you expect the multiple documents to help here. we have a collection of related stories. when a user searches for something, we might not want to display the story that is most-relevant (according to SOLR), but according to other home-grown rules. by combing all the possibilities in one SolrDocument, we can avoid a DB-hit to get related stories. But TF/IDF is calculated over all the values in the field. There's really no difference between a multi-valued field and storing all the data in a single field as far as relevance calculations are concerned. so.. it will suck regardless.. I thought we had per-field relevance in the current trunk. :-( Best Erick On Tue, May 31, 2011 at 11:02 AM, Ian Holsman had...@holsman.net wrote: Hi. I want to store a list of documents (say each being 30-60k of text) into a single SolrDocument. (to speed up post-retrieval querying) In order to do this, I need to know if lucene calculates the TF/IDF score over the entire field or does it treat each value in the list as a unique field? If I can't store it as a multi-value, I could create a schema where I put each document into a unique field, but I'm not sure how to create the query to search all the fields. Regards Ian
Using multiple CPUs for a single document base?
Is there a way to allow Solr to use multiple CPUs of a single, multi-core box, to increase scale (number of documents, number of searches) of the searchbase? The CoreAdmin wiki page talks about Multiple Cores as essentially independent document bases with independent indexes, but with some unification of administration at the grosser levels. That's not quite what I'm looking for, though. I want a single URL for add and search access, and a single logical searchbase, but I want to be able to use more of the resources of the physical box where the searchbase runs. I guess I thought I would get this for free, it being Java and all, but I don't seem to: even with hundreds of clients adding and searching, I only seem to use one hardware core, and a bit of a second (which I interpret to mean one Java thread for Solr, one Java thread for Java I/O). -==- Jack Repenning Technologist Codesion Business Unit CollabNet, Inc. 8000 Marina Boulevard, Suite 600 Brisbane, California 94005 office: +1 650.228.2562 twitter: http://twitter.com/jrep PGP.sig Description: This is a digitally signed message part
Re: Using multiple CPUs for a single document base?
Are you using a 1.4 version of Solr? It has since been improved for multi- threaded scalability such as a lot of neat non-blocking components. It's been a long time ago since i saw a Solr server not taking advantage of multiple cores. Today, they take 200% up to even 1200% CPU time when viewing with top. Here's one running at almost 700% processing 450+ queries/second. 15522 markus20 0 2432m 859m 10m S 698 10.8 2:05.41 java Is there a way to allow Solr to use multiple CPUs of a single, multi-core box, to increase scale (number of documents, number of searches) of the searchbase? The CoreAdmin wiki page talks about Multiple Cores as essentially independent document bases with independent indexes, but with some unification of administration at the grosser levels. That's not quite what I'm looking for, though. I want a single URL for add and search access, and a single logical searchbase, but I want to be able to use more of the resources of the physical box where the searchbase runs. I guess I thought I would get this for free, it being Java and all, but I don't seem to: even with hundreds of clients adding and searching, I only seem to use one hardware core, and a bit of a second (which I interpret to mean one Java thread for Solr, one Java thread for Java I/O). -==- Jack Repenning Technologist Codesion Business Unit CollabNet, Inc. 8000 Marina Boulevard, Suite 600 Brisbane, California 94005 office: +1 650.228.2562 twitter: http://twitter.com/jrep
Obtaining query AST?
Hi, I want to write my own query expander. It needs to obtain the AST (abstract syntax tree) of an already parsed query string, navigate to certain parts of it (words) and make logical phrases of those words by adding to the AST - where necessary. This cannot be done to the string because the query logic cannot be semantically altered. (e.g. AND, OR, paren's etc) so it must be parsed first. How can this be done with SolrJ? thanks for any tips. Darren
Re: Using multiple CPUs for a single document base?
On May 31, 2011, at 11:16 AM, Markus Jelsma wrote: Are you using a 1.4 version of Solr? Yeah, about those version numbers ... The tarball I installed claimed its version was apache-solr-3.1.0 Which sounds comfortably later than 1.4. But the examples/solr/schema.xml that comes with it claims version 1.3. I'm confused. -==- Jack Repenning Technologist Codesion Business Unit CollabNet, Inc. 8000 Marina Boulevard, Suite 600 Brisbane, California 94005 office: +1 650.228.2562 twitter: http://twitter.com/jrep PGP.sig Description: This is a digitally signed message part
Re: Using multiple CPUs for a single document base?
Yeah, ignore the 'multiple cores' you are seeing in the docs there, that's about something else unrelated to CPU's, as you discovered, has nothing to do with what you're asking about, put it out of your mind. I kind of think you should get multi-CPU use 'for free' as a Java app too. It does for me, heavy Solr usage, I look at my stats, multiple CPU cores are being excersized. (Of course, this is never going to be perfectly efficient, you aren't going to get double performance by doubling the CPUs in one box, there are various bottlenecks, as we all know). There are also some Java GC tuning you want to do with multiple cores, the default JVM settings aren't usually appropriate. (You want to background thread your GC, I forget the magic JVM invocations). But that's probably not related to your issue if you don't even see more than one CPU being exersized at all, weird. On 5/31/2011 1:44 PM, Jack Repenning wrote: Is there a way to allow Solr to use multiple CPUs of a single, multi-core box, to increase scale (number of documents, number of searches) of the searchbase? The CoreAdmin wiki page talks about Multiple Cores as essentially independent document bases with independent indexes, but with some unification of administration at the grosser levels. That's not quite what I'm looking for, though. I want a single URL for add and search access, and a single logical searchbase, but I want to be able to use more of the resources of the physical box where the searchbase runs. I guess I thought I would get this for free, it being Java and all, but I don't seem to: even with hundreds of clients adding and searching, I only seem to use one hardware core, and a bit of a second (which I interpret to mean one Java thread for Solr, one Java thread for Java I/O). -==- Jack Repenning Technologist Codesion Business Unit CollabNet, Inc. 8000 Marina Boulevard, Suite 600 Brisbane, California 94005 office: +1 650.228.2562 twitter: http://twitter.com/jrep
Re: Using multiple CPUs for a single document base?
1.3 is the schema version. It hasn't had as many upgrades. Solr 3.1 uses the 1.3 schema version, Solr 1.4.x uses the 1.2 schema version. On May 31, 2011, at 11:16 AM, Markus Jelsma wrote: Are you using a 1.4 version of Solr? Yeah, about those version numbers ... The tarball I installed claimed its version was apache-solr-3.1.0 Which sounds comfortably later than 1.4. But the examples/solr/schema.xml that comes with it claims version 1.3. I'm confused. -==- Jack Repenning Technologist Codesion Business Unit CollabNet, Inc. 8000 Marina Boulevard, Suite 600 Brisbane, California 94005 office: +1 650.228.2562 twitter: http://twitter.com/jrep
Re: Using multiple CPUs for a single document base?
On May 31, 2011, at 11:29 AM, Jonathan Rochkind wrote: I kind of think you should get multi-CPU use 'for free' as a Java app too. Ah, probably experimental error? If I apply a stress load consisting only of queries, I get automatic multi-core use as expected. I could see where indexing new dox could tend toward synchronization and uniprocessing. Perhaps my original test load was too add-centric, does that make sense? -==- Jack Repenning Technologist Codesion Business Unit CollabNet, Inc. 8000 Marina Boulevard, Suite 600 Brisbane, California 94005 office: +1 650.228.2562 twitter: http://twitter.com/jrep PGP.sig Description: This is a digitally signed message part
Re: Obtaining query AST?
I believe there is a query parser that accepts queries formatted in XML, allowing you to provide a parse tree to Solr; perhaps that would get you the control you're after. -Mike On 05/31/2011 02:24 PM, dar...@ontrenet.com wrote: Hi, I want to write my own query expander. It needs to obtain the AST (abstract syntax tree) of an already parsed query string, navigate to certain parts of it (words) and make logical phrases of those words by adding to the AST - where necessary. This cannot be done to the string because the query logic cannot be semantically altered. (e.g. AND, OR, paren's etc) so it must be parsed first. How can this be done with SolrJ? thanks for any tips. Darren
Re: Obtaining query AST?
Hi, thanks for the tip. I noticed the XML stuff, but the trouble is I am taking a query string entered by a user such as this OR that AND (this AND that) so I'm not sure how to go from that to a representational AST parse tree... I believe there is a query parser that accepts queries formatted in XML, allowing you to provide a parse tree to Solr; perhaps that would get you the control you're after. -Mike On 05/31/2011 02:24 PM, dar...@ontrenet.com wrote: Hi, I want to write my own query expander. It needs to obtain the AST (abstract syntax tree) of an already parsed query string, navigate to certain parts of it (words) and make logical phrases of those words by adding to the AST - where necessary. This cannot be done to the string because the query logic cannot be semantically altered. (e.g. AND, OR, paren's etc) so it must be parsed first. How can this be done with SolrJ? thanks for any tips. Darren
Re: Using multiple CPUs for a single document base?
Yep, that could be it. You certainly don't get _great_ concurrency support 'for free' in Java, concurrent programming is still tricky. Parts of Solr are surely better at it than others. The one place I'd be shocked was just with multiple concurrent queries, if those weren't helped by multi-CPUs. Multi CPUs won't neccesarily speed up any single query, they should just speed up the overall situation under heavy load. Which has been my observation. And it may be that multi-CPU's don't speed up add/commit much, as you possibly have observed. I do all my 'adds' to a seperate Solr index, and then replicate to a slave that actually serves queries. My 'master' that I do my adds to is actually on the very same server -- but I run it in an entirely different java container, in part to minimize any chance that it will end up competing for threads/CPUs with the slave serving queries, the OS level alone should ('should', famous last word) balance it to a different cpu core. (Of course, there's still only so much total CPU avail on the machine). On 5/31/2011 2:53 PM, Jack Repenning wrote: On May 31, 2011, at 11:29 AM, Jonathan Rochkind wrote: I kind of think you should get multi-CPU use 'for free' as a Java app too. Ah, probably experimental error? If I apply a stress load consisting only of queries, I get automatic multi-core use as expected. I could see where indexing new dox could tend toward synchronization and uniprocessing. Perhaps my original test load was too add-centric, does that make sense? -==- Jack Repenning Technologist Codesion Business Unit CollabNet, Inc. 8000 Marina Boulevard, Suite 600 Brisbane, California 94005 office: +1 650.228.2562 twitter: http://twitter.com/jrep
Re: Obtaining query AST?
You're going to have to parse it yourself. Or, since Solr is open source, you can take pieces of the existing query parsers (dismax or lucene), and repurpose them. But I don't _think_ (I could be wrong) there is any public API in Solr/SolrJ that will give you an AST. On 5/31/2011 3:18 PM, dar...@ontrenet.com wrote: Hi, thanks for the tip. I noticed the XML stuff, but the trouble is I am taking a query string entered by a user such as this OR that AND (this AND that) so I'm not sure how to go from that to a representational AST parse tree... I believe there is a query parser that accepts queries formatted in XML, allowing you to provide a parse tree to Solr; perhaps that would get you the control you're after. -Mike On 05/31/2011 02:24 PM, dar...@ontrenet.com wrote: Hi, I want to write my own query expander. It needs to obtain the AST (abstract syntax tree) of an already parsed query string, navigate to certain parts of it (words) and make logical phrases of those words by adding to the AST - where necessary. This cannot be done to the string because the query logic cannot be semantically altered. (e.g. AND, OR, paren's etc) so it must be parsed first. How can this be done with SolrJ? thanks for any tips. Darren
Re: Splitting fields
Hi, Write a custom UpdateProcessor, which gives you full control of the SolrDocument prior to indexing. The best would be if you write a generic FieldSplitterProcessor which is configurable on what field to take as input, what delimiter or regex to split on and finally what fields to write the result to. This way other may re-use your code for their splitting needs. See http://wiki.apache.org/solr/UpdateRequestProcessor and http://wiki.apache.org/solr/SolrConfigXml#UpdateRequestProcessorChain_section -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 27. mai 2011, at 15.47, Joe Fitzgerald wrote: Hello, I am in an odd position. The application server I use has built-in integration with SOLR. Unfortunately, its native capabilities are fairly limited, specifically, it only supports a standard/pre-defined set of fields which can be indexed. As a result, it has left me kludging how I work with Solr and doing things like putting what I'd like to be multiple, separate fields into a single Solr field. As an example, I may put a customer id and name into a single field called 'custom1'. Ideally, I'd like this information to be returned in separate fields...and even better would be for them to be indexed as separate fields but I can live without the latter. Currently, I'm building out a json representation of this information which makes it easy for me to deal with when I extract the results...but it all feels wrong. I do have complete control over the actual Solr installation (just not the indexing call to Solr), so I was hoping there may be a way to configure Solr to take my single field and split it up into a different field for each key in my json representation. I don't see anything native to Solr that would do this for me but there are a few features that I thought sounded similar and was hoping to get some opinions on how I may be able to move forward with this... Poly fields, such as the spatial location, might help? Can I build my own poly-field that would split up the main field into subfields? Do poly-fields let me return the subfields? I don't quite have my head around polyfields yet. Another option although I suspect this won't be considered a good approach, but what about extending the copyField functionality of schema.xml to support my needs? It would seem not entirely unreasonable that copyField would provide a means to extract only a portion of the contents of the source field to place in the destination field, no? I'm sure people more familiar with Solr's architecture could explain why this isn't really an appropriate thing for Solr to handle (just because it could doesn't mean it should)... The other - and probably best -- option would be to leverage Solr directly, bypassing the native integration of my application server, which we've already done for most cases. I'd love to go this route but I'm having a hard time figuring out how to easily accomplish the same functionality provided by my app server integration...perhaps someone on the list could help me with this path forward? Here is what I'm trying to accomplish: I'm indexing documents (text, pdf, html...) but I need to include fields in the results of my searches which are only available from a db query. I know how to have Solr index results from a db query, but I'm having trouble getting it to index the documents that are associated to each record of that query (full path/filename is one of the fields of that query). I started to try to use the dataImport handler to do this, by setting up a FileDataSource in addition to my jdbc data source. I tried to leverage the filedatasource to populate a sub-entity based on the db field that contains the full path/filename, but I wasn't sure how to specify the db field from the root query/entity. Before I spent too much time, I also realized I wasn't sure how to get Solr to deal with binary file types this way either which upon further reading seemed like I would need to leverage Tika - can that be done within the confines of dataimporthandler? Any advice is greatly appreciated. Thanks in advance, Joe
Re: Using multiple CPUs for a single document base?
If you use only one thread when indexing then one one core is going to be used. On May 31, 2011, at 11:29 AM, Jonathan Rochkind wrote: I kind of think you should get multi-CPU use 'for free' as a Java app too. Ah, probably experimental error? If I apply a stress load consisting only of queries, I get automatic multi-core use as expected. I could see where indexing new dox could tend toward synchronization and uniprocessing. Perhaps my original test load was too add-centric, does that make sense? -==- Jack Repenning Technologist Codesion Business Unit CollabNet, Inc. 8000 Marina Boulevard, Suite 600 Brisbane, California 94005 office: +1 650.228.2562 twitter: http://twitter.com/jrep
Re: Boosting fields at query time in Standard Request Handler from Solrconfig.xml
Hi, You need to add str name=defTypeedismax/str -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 31. mai 2011, at 18.08, Vignesh Raj wrote: Hi, I am developing a search engine app using Asp.Net, C# and Solrnet. I use the standard request handler. Is there a way I can boost the fields at query time from inside the solrconfig.xml file itself. Just like the qf field for Dismax handler. Right now am searching like field1:value^1.5 field2:value^1.2 field3:value^0.8 and this is done in the middle tier. I want Solr itself to do this using standard request handler. Can I write a similar kind of thing inside standard req handler? Here is my solrconfig file. requestHandler name=standard class=solr.SearchHandler default=true lst name=defaults str name=echoParamsexplicit/str str name=hltrue/str str name=hl.snippets3/str str name=hl.fragsize25/str str name=qffile_description^100.0 file_content^6.0 file_name^10.0 file_comments^4.0 /str /lst arr name=last-components strspellcheck/str /arr /requestHandler But am not able to see the results if I add this in my solrconfig.xml file. I have edited the post to add my req handler code in solrconfig. But, if have my query string as file_description:result^1.0 file_content:result^0.6 file_name:result^0.5 file_comments:result^0.8, am able to see the required result. Regards Vignesh
Re: Using multiple CPUs for a single document base?
You say it like it's something you have control over; how would one choose to use more than one thread when indexing? I guess maybe it depends on how you're indexing of course; I guess if you're using SolrJ it's straightforward. What if you're using the ordinary HTTP Post interface, or DIH? On 5/31/2011 3:35 PM, Markus Jelsma wrote: If you use only one thread when indexing then one one core is going to be used. On May 31, 2011, at 11:29 AM, Jonathan Rochkind wrote: I kind of think you should get multi-CPU use 'for free' as a Java app too. Ah, probably experimental error? If I apply a stress load consisting only of queries, I get automatic multi-core use as expected. I could see where indexing new dox could tend toward synchronization and uniprocessing. Perhaps my original test load was too add-centric, does that make sense? -==- Jack Repenning Technologist Codesion Business Unit CollabNet, Inc. 8000 Marina Boulevard, Suite 600 Brisbane, California 94005 office: +1 650.228.2562 twitter: http://twitter.com/jrep
Re: Using multiple CPUs for a single document base?
On May 31, 2011, at 12:24 PM, Jonathan Rochkind wrote: I do all my 'adds' to a seperate Solr index, and then replicate to a slave that actually serves queries. Yes, that's a step I'm holding in reserve. Probably get there some day, as I expect always to have a very high add-to-query ratio. But for the moment, I don't think I need it. My 'master' that I do my adds to is actually on the very same server -- but I run it in an entirely different java container, Now THAT was an interesting data point, thanks very much! I hadn't thought of running the master on the same box! -==- Jack Repenning Technologist Codesion Business Unit CollabNet, Inc. 8000 Marina Boulevard, Suite 600 Brisbane, California 94005 office: +1 650.228.2562 twitter: http://twitter.com/jrep PGP.sig Description: This is a digitally signed message part
Re: Splitting fields
I'd go for this option as well. The example update processor can't make it more easier and it's a very flexible approach. Judging from the patch in SOLR-2105 it should still work with the current 3.2 branch. https://issues.apache.org/jira/browse/SOLR-2105 Hi, Write a custom UpdateProcessor, which gives you full control of the SolrDocument prior to indexing. The best would be if you write a generic FieldSplitterProcessor which is configurable on what field to take as input, what delimiter or regex to split on and finally what fields to write the result to. This way other may re-use your code for their splitting needs. See http://wiki.apache.org/solr/UpdateRequestProcessor and http://wiki.apache.org/solr/SolrConfigXml#UpdateRequestProcessorChain_sect ion -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 27. mai 2011, at 15.47, Joe Fitzgerald wrote: Hello, I am in an odd position. The application server I use has built-in integration with SOLR. Unfortunately, its native capabilities are fairly limited, specifically, it only supports a standard/pre-defined set of fields which can be indexed. As a result, it has left me kludging how I work with Solr and doing things like putting what I'd like to be multiple, separate fields into a single Solr field. As an example, I may put a customer id and name into a single field called 'custom1'. Ideally, I'd like this information to be returned in separate fields...and even better would be for them to be indexed as separate fields but I can live without the latter. Currently, I'm building out a json representation of this information which makes it easy for me to deal with when I extract the results...but it all feels wrong. I do have complete control over the actual Solr installation (just not the indexing call to Solr), so I was hoping there may be a way to configure Solr to take my single field and split it up into a different field for each key in my json representation. I don't see anything native to Solr that would do this for me but there are a few features that I thought sounded similar and was hoping to get some opinions on how I may be able to move forward with this... Poly fields, such as the spatial location, might help? Can I build my own poly-field that would split up the main field into subfields? Do poly-fields let me return the subfields? I don't quite have my head around polyfields yet. Another option although I suspect this won't be considered a good approach, but what about extending the copyField functionality of schema.xml to support my needs? It would seem not entirely unreasonable that copyField would provide a means to extract only a portion of the contents of the source field to place in the destination field, no? I'm sure people more familiar with Solr's architecture could explain why this isn't really an appropriate thing for Solr to handle (just because it could doesn't mean it should)... The other - and probably best -- option would be to leverage Solr directly, bypassing the native integration of my application server, which we've already done for most cases. I'd love to go this route but I'm having a hard time figuring out how to easily accomplish the same functionality provided by my app server integration...perhaps someone on the list could help me with this path forward? Here is what I'm trying to accomplish: I'm indexing documents (text, pdf, html...) but I need to include fields in the results of my searches which are only available from a db query. I know how to have Solr index results from a db query, but I'm having trouble getting it to index the documents that are associated to each record of that query (full path/filename is one of the fields of that query). I started to try to use the dataImport handler to do this, by setting up a FileDataSource in addition to my jdbc data source. I tried to leverage the filedatasource to populate a sub-entity based on the db field that contains the full path/filename, but I wasn't sure how to specify the db field from the root query/entity. Before I spent too much time, I also realized I wasn't sure how to get Solr to deal with binary file types this way either which upon further reading seemed like I would need to leverage Tika - can that be done within the confines of dataimporthandler? Any advice is greatly appreciated. Thanks in advance, Joe
Searching Database
How can I use SOLR (version 3.1) to search in our Microsoft SQL Server database? I looked at the DIH example but that looks like it is for importing. I also looked at the following link: http://wiki.apache.org/solr/DataImportHandler Please send me a link to any instructions to set up SOLR so that I can search the database. Thank You, Roger
Re: Using multiple CPUs for a single document base?
I haven't given it a try but perhaps opening multiple HTTP connections to the update handler will end up in multiple threads thus better CPU utilization. Would be nice if someone can prove it here, i'm not in `the lab` right now ;) You say it like it's something you have control over; how would one choose to use more than one thread when indexing? I guess maybe it depends on how you're indexing of course; I guess if you're using SolrJ it's straightforward. What if you're using the ordinary HTTP Post interface, or DIH? On 5/31/2011 3:35 PM, Markus Jelsma wrote: If you use only one thread when indexing then one one core is going to be used. On May 31, 2011, at 11:29 AM, Jonathan Rochkind wrote: I kind of think you should get multi-CPU use 'for free' as a Java app too. Ah, probably experimental error? If I apply a stress load consisting only of queries, I get automatic multi-core use as expected. I could see where indexing new dox could tend toward synchronization and uniprocessing. Perhaps my original test load was too add-centric, does that make sense? -==- Jack Repenning Technologist Codesion Business Unit CollabNet, Inc. 8000 Marina Boulevard, Suite 600 Brisbane, California 94005 office: +1 650.228.2562 twitter: http://twitter.com/jrep
Re: Searching Database
Roger, .. but that looks like it is for importing .. it will remain the same - no matter how often you'll search for it :) because that is .. what solr is for. you have to import (and this means indexing, analyzing ..) the whole content that you want to search. You could either use DIH to import that content directly or use the UpdateXML- / UpdateJSON-Handler to push the Content to SOLR. Regards Stefan Am 31.05.2011 21:44, schrieb Roger Shah: How can I use SOLR (version 3.1) to search in our Microsoft SQL Server database? I looked at the DIH example but that looks like it is for importing. I also looked at the following link: http://wiki.apache.org/solr/DataImportHandler Please send me a link to any instructions to set up SOLR so that I can search the database. Thank You, Roger
Re: Searching Database
Roger, You have to import/index into Solr before you can search it. Solr can't go into your MS SQL server and search data in there. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Roger Shah rs...@caci.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Tue, May 31, 2011 3:44:42 PM Subject: Searching Database How can I use SOLR (version 3.1) to search in our Microsoft SQL Server database? I looked at the DIH example but that looks like it is for importing. I also looked at the following link: http://wiki.apache.org/solr/DataImportHandler Please send me a link to any instructions to set up SOLR so that I can search the database. Thank You, Roger
Re: Searching Database
On Wed, Jun 1, 2011 at 1:14 AM, Roger Shah rs...@caci.com wrote: How can I use SOLR (version 3.1) to search in our Microsoft SQL Server database? I looked at the DIH example but that looks like it is for importing. I also looked at the following link: http://wiki.apache.org/solr/DataImportHandler [...] That is exactly what you need. You first have to import the data into Solr/Lucene before you can search it. I think that you might be mistaken in your view of Solr: It is *not* an add-on to a database that allows search. Regards, Gora
Re: Searching Database
Roger, how about not hijacking another user's thread and not hijacking your already hijacked thread twice more? Chaning the e-mail subject won't change the header's contents. How can I use SOLR (version 3.1) to search in our Microsoft SQL Server database? I looked at the DIH example but that looks like it is for importing. I also looked at the following link: http://wiki.apache.org/solr/DataImportHandler Please send me a link to any instructions to set up SOLR so that I can search the database. Thank You, Roger
RE: Searching Database
Sorry, Markus. I was not aware I need to create a new email. -Original Message- From: Markus Jelsma [mailto:markus.jel...@openindex.io] Sent: Tuesday, May 31, 2011 3:48 PM To: solr-user@lucene.apache.org Subject: Re: Searching Database Roger, how about not hijacking another user's thread and not hijacking your already hijacked thread twice more? Chaning the e-mail subject won't change the header's contents. How can I use SOLR (version 3.1) to search in our Microsoft SQL Server database? I looked at the DIH example but that looks like it is for importing. I also looked at the following link: http://wiki.apache.org/solr/DataImportHandler Please send me a link to any instructions to set up SOLR so that I can search the database. Thank You, Roger
Re: Better Spellcheck
Hi Tanner, We have something we call DYM ReSearcher that helps in situations like these, esp. with multi-word queries that Lucene/Solr spellcheckers have trouble with. See http://sematext.com/products/dym-researcher/index.html Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Tanner Postert tanner.post...@gmail.com To: solr-user@lucene.apache.org Sent: Tue, May 31, 2011 11:31:07 AM Subject: Better Spellcheck I've tried to use a spellcheck dictionary built from my own content, but my content ends up having a lot of misspelled words so the spellcheck ends up being less than effective. I could use a standard dictionary, but it may have problems with proper nouns. It also misses phrases. When someone searches for Untied States I would hope the spellcheck would suggest United States but it just recognizes that untied is a valid word and doesn't suggest any thing. Is there any way around this? Are there any third party modules or spellcheck systems that I could implement to get these type of features?
Re: Using multiple CPUs for a single document base?
On May 31, 2011, at 12:44 PM, Markus Jelsma wrote: I haven't given it a try but perhaps opening multiple HTTP connections to the update handler will end up in multiple threads thus better CPU utilization. My original test case had hundreds of HTTP connections (all to the same URL) doing adds, but seemed to use only one CPU core for adding, or to serialize the adds somehow, something like that ... at any rate, I couldn't drive CPU use above ~120% with that configuration. This is quite different from queries. For queries (or a rich query-to-add mix), I can easily drive CPU use into multiple-hundreds of % CPU, with just a few dozen concurrent query connections (running flat out). But adds resist that trick. I don't know whether this means that adds really are using a single thread, or if they're using multiple threads but synchronizing on some monitor. Actually, I can't say I care much: bottom line seems to be I only use one CPU core (plus a negligible marginal bit) for adds. Since I've confirmed that queries spread neatly, I can live with the single-thready adds. In production, it seems likely that I'll be more or less continuously spending one CPU core on adds, and the rest on queries. -==- Jack Repenning Technologist Codesion Business Unit CollabNet, Inc. 8000 Marina Boulevard, Suite 600 Brisbane, California 94005 office: +1 650.228.2562 twitter: http://twitter.com/jrep PGP.sig Description: This is a digitally signed message part
Solr memory consumption
I run multiple-core solr with flags: -Xms3g -Xmx6g -D64, but i see this in top after 6-8 hours and still raising: 17485 test214 10.0g 7.4g 9760 S 308.2 31.3 448:00.75 java -Xms3g -Xmx6g -D64 -Dsolr.solr.home=/home/test/solr/example/multicore/ -jar start.jar Are there any ways to limit memory for sure? Thanks
Re: Using multiple CPUs for a single document base?
As far as I know you're on the right track, adds are single threaded. You can have multiple threads making indexing requests from your client, but that's primarily aimed at making the I/O not be the bottleneck, at some point the actual indexing of the documents is single-threaded. It'd be tricky, very tricky to have multiple threads writing to an index at the same time, much less multiple CPUs. If you're desperate to index quickly, you can index into several cores, even on separate machines and merge the results. Best Erick On Tue, May 31, 2011 at 4:13 PM, Jack Repenning jrepenn...@collab.net wrote: On May 31, 2011, at 12:44 PM, Markus Jelsma wrote: I haven't given it a try but perhaps opening multiple HTTP connections to the update handler will end up in multiple threads thus better CPU utilization. My original test case had hundreds of HTTP connections (all to the same URL) doing adds, but seemed to use only one CPU core for adding, or to serialize the adds somehow, something like that ... at any rate, I couldn't drive CPU use above ~120% with that configuration. This is quite different from queries. For queries (or a rich query-to-add mix), I can easily drive CPU use into multiple-hundreds of % CPU, with just a few dozen concurrent query connections (running flat out). But adds resist that trick. I don't know whether this means that adds really are using a single thread, or if they're using multiple threads but synchronizing on some monitor. Actually, I can't say I care much: bottom line seems to be I only use one CPU core (plus a negligible marginal bit) for adds. Since I've confirmed that queries spread neatly, I can live with the single-thready adds. In production, it seems likely that I'll be more or less continuously spending one CPU core on adds, and the rest on queries. -==- Jack Repenning Technologist Codesion Business Unit CollabNet, Inc. 8000 Marina Boulevard, Suite 600 Brisbane, California 94005 office: +1 650.228.2562 twitter: http://twitter.com/jrep
What's your query result cache's stats?
Hi, I've seen the stats page many times, of quite a few installations and even more servers. There's one issue that keeps bothering me: the cumulative hit ratio of the query result cache, it's almost never higher than 50%. What are your stats? How do you deal with it? In some cases i have to disable it because of the high warming penalty i get in a frequently changing index. This penalty is worse than the very little performance gain i get. Different users accidentally using the same query or a single user that's actually browsing the result set only happens very occasionally. And if i wanted the hit ratio to climb i'd have to increase the cache size and warming size to absurd values, only then i might just reach about 60% hit ratio. Cheers,
RE: Solr memory consumption
It could be environment specific (specific of your top command implementation, OS, etc) I have on CentOS 2986m virtual memory showing although -Xmx2g You have 10g virtual although -Xmx6g Don't trust it too much... top command may count OS buffers for opened files, network sockets, JVM DLLs itself, etc (which is outside Java GC responsibility); additionally to JVM memory... it counts all memory, not sure... if you don't have big values for 99.9%wa (which means WAIT I/O - disk swap usage) everyhing is fine... -Original Message- From: Denis Kuzmenok Sent: May-31-11 4:18 PM To: solr-user@lucene.apache.org Subject: Solr memory consumption I run multiple-core solr with flags: -Xms3g -Xmx6g -D64, but i see this in top after 6-8 hours and still raising: 17485 test214 10.0g 7.4g 9760 S 308.2 31.3 448:00.75 java -Xms3g -Xmx6g -D64 -Dsolr.solr.home=/home/test/solr/example/multicore/ -jar start.jar Are there any ways to limit memory for sure? Thanks
Re: What's your query result cache's stats?
On May 31, 2011, at 2:02 PM, Markus Jelsma wrote: the cumulative hit ratio of the query result cache, it's almost never higher than 50%. What are your stats? How do you deal with it? warmupTime : 0 cumulative_lookups : 394867 cumulative_hits : 394780 cumulative_hitratio : 0.99 cumulative_inserts : 87 cumulative_evictions : 0 Of course, that's shortly after I ran a query-intensive, not very creative load test (thousands of identical queries of a not very changeable data set). As a matter of fact, the numbers say I had exactly one miss after each insert, and everything else was a cache hit. Which makes perfect sense, for my (really dumb) test case. In some cases i have to disable it because of the high warming penalty i get in a frequently changing index. This penalty is worse than the very little performance gain i get. Different users accidentally using the same query or a single user that's actually browsing the result set only happens very occasionally. And if i wanted the hit ratio to climb i'd have to increase the cache size and warming size to absurd values, only then i might just reach about 60% hit ratio. If you have humans randomizing the query stream, I'm sure you're right. If you're convinced your queries are unrelated and variable, why would you expect a query cache to help at all? On the other hand, I actually plan to use my Solr base to drive a UI, where the query parameters never change, and the data underneath changes mostly in bursts (generally near the end of the work day), so I suspect I'll only see misses after a document add, while lookups ten to cluster early in the day. So I actually am hoping for a high hit ratio. -==- Jack Repenning Technologist Codesion Business Unit CollabNet, Inc. 8000 Marina Boulevard, Suite 600 Brisbane, California 94005 office: +1 650.228.2562 twitter: http://twitter.com/jrep PGP.sig Description: This is a digitally signed message part
Re: Obtaining query AST?
Hi Darren, I think that if I had to get the parsing result, I would create my own QueryComponent which would create the parser in the 'prepare' function (you can take a look to the actual QueryComponent class) and instead of resolving the query in the 'process' function, I would just parse the query and then it should be possible to serialize the returned Query object to the response. Then you could declare this new query component in the solr config file. And finally, with solrj, you should be able to get the parsed query in the response, unserialize it and do your stuff ;) The Query object could be considered as an AST, I think :). This is how I would start, if I had to do that. Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Obtaining-query-AST-tp3007289p3008330.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Obtaining query AST?
Darren, you can even take a look to the DebugComponent which returns the parsed query in a string form. It uses the QueryParsing class to parse the query, you could perhaps do the same. Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Obtaining-query-AST-tp3007289p3008349.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Obtaining query AST?
Ludovic, Thank you for this tip, it sounds useful. Darren On Tue, 2011-05-31 at 14:38 -0700, lboutros wrote: Darren, you can even take a look to the DebugComponent which returns the parsed query in a string form. It uses the QueryParsing class to parse the query, you could perhaps do the same. Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Obtaining-query-AST-tp3007289p3008349.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Obtaining query AST?
Hi, have a look at the flexible query parser of lucene (contrib package) [1]. It provides a framework to easily create different parsing logic. You should be able to access the AST and to modify as you want how it can be translated into a Lucene query (look at processors and pipeline processors). One time you have your own query parser, then it is straightforward to plug it into Solr. [1] http://lucene.apache.org/java/3_1_0/api/contrib-queryparser/index.html -- Renaud Delbru On 31/05/11 19:24, dar...@ontrenet.com wrote: Hi, I want to write my own query expander. It needs to obtain the AST (abstract syntax tree) of an already parsed query string, navigate to certain parts of it (words) and make logical phrases of those words by adding to the AST - where necessary. This cannot be done to the string because the query logic cannot be semantically altered. (e.g. AND, OR, paren's etc) so it must be parsed first. How can this be done with SolrJ? thanks for any tips. Darren
Re: copyField generates multiple values encountered for non multiValued field
Alexander, I saw the same behavior in 1.4.x with non-multivalued fields when updating the document in the index (i.e obtaining the doc from the index, modifying some fields and then adding the document with the same id back). I do not know what causes this, but it looks like the copyField logic completely bypasses the multivalueness check and just adds the value in addition to whatever already there (instead of replacing the value). So yes, Solr renders itself into incorrect state then (note that the index is still correct from the Lucene's standpoint). -Alexander On Wed, 2011-05-25 at 16:50 +0200, Alexander Golubowitsch wrote: Dear list, hope somebody can help me understand/avoid this. I am sending an add request with allowDuplicates=false to a Solr 1.4.1 instance. This is for debugging purposes, so I am sending the exact same data that are already stored in Solr's index. I am using the PHP PECL libraries, which fail completely in giving me any hint on what goes wrong. Only sending the same add request again gives me a proper SolrClientException that hints: ERROR: [288400] multiple values encountered for non multiValued field field2 [fieldvalue, fieldvalue] The scenario: - field1 is implicitly single value, type text, indexed and stored - field2 is generated via a copyField directive in schema.xml, implicitly single value, type string, indexed and stored What appears to happen: - On the first add (SolrClient::addDocuments(array(SolrInputDocument theDocument))), regular fields like field1 get overwritten as intended - field2, defined with a copyField, but still single value, gets _appended_ instead - When I retrieve the updated document in a query and try to add it again, it won't let me because of the inconsistent multi-value state - The PECL library, in addition, appears to hit some internal exception (that it doesn't handle properly) when encountering multiple values for a single value field. That gives me zero results querying a set that includes the document via PHP, while the document can be retrieved properly, though in inconsistent state, any other way. But: Solr appears to be generating the corrupted state itsself via copyField? What's going wrong? I'm pretty confused... Thank you, Alex
Odd (i.e. wrong) File Names in 3.1 distro source zip
Hi, all. I just downloaded the apache-solr-3.1.0-src.gz file, and unzipped that. I see inside there, a apache-solr-3.1.0-src file, and tried unzipping that. There weren't any errors, but as I look inside the apache-solr-3.0.1-src file, I see that not all the java code (for example) ended up being unzipped with a .java extension. For example, in the path apache-solr-3.1.0\lucene\backwards\src\test\org\apache\lucene\analysis\tokenattributes I see two files: TestSimpleAtt100644 TestTermAttri100644 Any ideas? Is there some specific tool I should be using to expand these? I'm doing this in Windows XP. Thanks! Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.comhttp://www.sirsidynix.com/ Join the conversation - you may even get an iPad or Nook out of it! [cid:image002.jpg@01CC1FC6.2324C620]http://www.facebook.com/SirsiDynixLike us on Facebook! [cid:image004.jpg@01CC1FC6.2324C620]http://twitter.com/#!/SirsiDynixFollow us on Twitter!
Solr vs ElasticSearch
I've been hearing more and more about ElasticSearch. Can anyone give me a rough overview on how these two technologies differ. What are the strengths/weaknesses of each. Why would one choose one of the other? Thanks
Re: Solr vs ElasticSearch
Mark, Nice email address. I personally have no idea, maybe ask Shay Banon to post an answer? I think it's possible to make Solr more elastic, eg, it's currently difficult to make it move cores between servers without a lot of manual labor. Jason On Tue, May 31, 2011 at 7:33 PM, Mark static.void@gmail.com wrote: I've been hearing more and more about ElasticSearch. Can anyone give me a rough overview on how these two technologies differ. What are the strengths/weaknesses of each. Why would one choose one of the other? Thanks
RE: DIH: Exception with Too many connections
Stephan, Your advice (check the process list) gave me an important clue for my solution. I changed my database connection to the slave instead of master, so that I can use more threads. Thank you very much! *** François, My setting is the default value: max_connections = 151 max_user_connections = 0 I will think of changing the max_connections when I increase the number of cores. Thanks! *** Fuad, So far, I can handle DIH with my setting. Thanks for letting me know! Regards, Tiffany -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-Exception-with-Too-many-connections-tp3005213p3009206.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr vs ElasticSearch
Interesting wordings: we want real-time search, we want simple multi-tenancy, and we want a solution that is built for the cloud And later, built on top of Lucene. Is that possible? :) (what does that mean real time search anyway... and what is cloud?) community is growing! P.S. I never used Elastic Search, but I used Compass before moving to SOLR. And Compass uses wordings like as real-time *transactional* search. Yes, it's good and it has own use case (small databases, reduced development time, junior-level staff, single-JVM environment) I'd consider requirements at first, then will see which tool simplifies my task (fulfils most requirements). It could be Elastic, or SOLR, or Compass, or direct Lucene, or even SQL, SequenceFile, SQL, in-memory TreeSet, and etc. Also depends on requirements, budget, teamskills. -Original Message- From: Mark Sent: May-31-11 10:33 PM To: solr-user@lucene.apache.org Subject: Solr vs ElasticSearch I've been hearing more and more about ElasticSearch. Can anyone give me a rough overview on how these two technologies differ. What are the strengths/weaknesses of each. Why would one choose one of the other? Thanks