Re: OutOfMemoryErrors
U can set up in startup script of tomcat -- View this message in context: http://lucene.472066.n3.nabble.com/OutOfMemoryErrors-tp1181731p1182582.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: OutOfMemoryErrors
put that line in your startup script or u can set as env var export CATALINA_OPTS=-Xms256m -Xmx1024m; -- View this message in context: http://lucene.472066.n3.nabble.com/OutOfMemoryErrors-tp1181731p1182708.html Sent from the Solr - User mailing list archive at Nabble.com.
Creating new Solr cores using relative paths
I'm having trouble getting the core CREATE command to work with relative paths in the solr.xml configuration. I'm working with a layout like this: /opt/solr [this is solr.solr.home: $SOLR_HOME] /opt/solr/solr.xml /opt/solr/core0/ [this is the template core] /opt/solr/core0/conf/schema.xml [etc.] /opt/tomcat/bin [where tomcat is started from: $TOMCAT_HOME/bin] My very basic solr.xml: solr persistent=true cores adminPath=/admin/cores core name=core0 instanceDir=core0// /cores /solr The CREATE core command works fine with absolute paths, but I have a requirement to use relative paths. I want to be able to create a new core like this: http://localhost:8080/solr/admin/cores ?action=CREATE name=core1 instanceDir=core1 config=core0/conf/solrconfig.xml schema=core0/conf/schema.xml (core1 is the name for the new core to be created, and I want to use the config and schema from core0 to create the new core). but the error is always due to the servlet container thinking $TOMCAT_HOME/bin is the current working directory: Caused by: java.lang.RuntimeException: *Can't find resource 'core0/conf/solrconfig.xml'* in classpath or '/opt/solr/core1/conf/', * cwd=/opt/tomcat/bin * Does anyone know how to make this happen? Thanks, -Jay
Re: maxMergeDocs and performance tuning
Okay, thanks Marc. I don't really have any complaints about performance (yet!) but I'm still wondering how the mechanics work, e.g. when you have a number of segments equal to mergeFactor, and each contains maxMergeDocs documents. The docs are a bit fuzzy on this... -- View this message in context: http://lucene.472066.n3.nabble.com/maxMergeDocs-and-performance-tuning-tp1162695p1183064.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: OutOfMemoryErrors
Should I add this line with double quote or not? because if I don't, it doesn't work at all in my /etc/init.d/tomcat6. export CATALINA_OPTS=-Xms256m -Xmx1024m; On Tue, Aug 17, 2010 at 1:36 PM, Grijesh.singh pintu.grij...@gmail.comwrote: put that line in your startup script or u can set as env var export CATALINA_OPTS=-Xms256m -Xmx1024m; -- View this message in context: http://lucene.472066.n3.nabble.com/OutOfMemoryErrors-tp1181731p1182708.html Sent from the Solr - User mailing list archive at Nabble.com. -- Chhorn Chamnap http://chamnapchhorn.blogspot.com/
Re: OutOfMemoryErrors
Is there a way to verify that I have added correctlly? On Tue, Aug 17, 2010 at 2:41 PM, Chamnap Chhorn chamnapchh...@gmail.comwrote: Should I add this line with double quote or not? because if I don't, it doesn't work at all in my /etc/init.d/tomcat6. export CATALINA_OPTS=-Xms256m -Xmx1024m; On Tue, Aug 17, 2010 at 1:36 PM, Grijesh.singh pintu.grij...@gmail.comwrote: put that line in your startup script or u can set as env var export CATALINA_OPTS=-Xms256m -Xmx1024m; -- View this message in context: http://lucene.472066.n3.nabble.com/OutOfMemoryErrors-tp1181731p1182708.html Sent from the Solr - User mailing list archive at Nabble.com. -- Chhorn Chamnap http://chamnapchhorn.blogspot.com/ -- Chhorn Chamnap http://chamnapchhorn.blogspot.com/
Re: OutOfMemoryErrors
U can add like this it will work I am using it JAVA_OPTS=$JAVA_OPTS -Xms1024m -Xmx4096m -- View this message in context: http://lucene.472066.n3.nabble.com/OutOfMemoryErrors-tp1181731p1183229.html Sent from the Solr - User mailing list archive at Nabble.com.
Search document design problem
Hi all, I would like to use Solr to replace our site search based on MySQL but I am not sure how to map entities into the search index. The model is described byt the attached UML class diagram. I have a Hotel that resides in some City in some Country. The hotel has various Rooms. For each Room in a Hotel there are some Packages that can be purchased by the client. The entity returned from the search will be mainly the Hotel. E.g.: - all hotels in USA - all hotels in New York - all hotels with name containing Hilton - all hotels in Egypt with packages with all inclusive boarding and price lower than 400 and startDate between 2010-08-20 and 2010-08-30 Our application also uses faceting a lot. e.g: - # of hotels per country/city - # of hotels based on room size (# of beds - 1 bed - 100 hotels, 2 beds - 200 hotels, ...) - # of hotels based on all inclusive package prices (0-100 EUR, 100-200 EUR, ...) But there are also use cases when a search should return a Room or Package directly. I'd like to use Data Import Handler to index directly from our database. But which approach of mapping entities into the search index to use? It seems to me that there are at least 2 ways. 1) One index based on Hotel with multivalued fields for Rooms and multivalued fields for Packages. In DIH: document entity name=hotel ... field name=id .../ entity name=room ... field name=room_id .../ entity name=package... field .../ /entity /entity /entity /document But I am not sure whether this will work due to multivalued fields. The queries may span accross all the entities - I want only hotels that have room with 2 beds and the room has a package with all inclusive boarding and price lower than 400. 2) Denormalize data, so that there will be only one index based on Packages containing (duplicated) all the data from Room and Hotel and then use Field Collapsing on Hotel ID for search results and faceting too. This would enable also direct search for Packages or Rooms but I am not sure about Field Collapsing which is still a kind of beta functionality and about potential performance costs. Can anybody give me some advice or share their experiences? Thanks a lot Wenca
Re: Search document design problem
Oops, it seems that the mailing list does not support attachments. Here's a link to the diagram image: http://dl.dropbox.com/u/10214557/model.png Wenca Dne 17.8.2010 11:30, Wenca napsal(a): Hi all, I would like to use Solr to replace our site search based on MySQL but I am not sure how to map entities into the search index. The model is described byt the attached UML class diagram. I have a Hotel that resides in some City in some Country. The hotel has various Rooms. For each Room in a Hotel there are some Packages that can be purchased by the client. The entity returned from the search will be mainly the Hotel. E.g.: - all hotels in USA - all hotels in New York - all hotels with name containing Hilton - all hotels in Egypt with packages with all inclusive boarding and price lower than 400 and startDate between 2010-08-20 and 2010-08-30 Our application also uses faceting a lot. e.g: - # of hotels per country/city - # of hotels based on room size (# of beds - 1 bed - 100 hotels, 2 beds - 200 hotels, ...) - # of hotels based on all inclusive package prices (0-100 EUR, 100-200 EUR, ...) But there are also use cases when a search should return a Room or Package directly. I'd like to use Data Import Handler to index directly from our database. But which approach of mapping entities into the search index to use? It seems to me that there are at least 2 ways. 1) One index based on Hotel with multivalued fields for Rooms and multivalued fields for Packages. In DIH: document entity name=hotel ... field name=id .../ entity name=room ... field name=room_id .../ entity name=package... field .../ /entity /entity /entity /document But I am not sure whether this will work due to multivalued fields. The queries may span accross all the entities - I want only hotels that have room with 2 beds and the room has a package with all inclusive boarding and price lower than 400. 2) Denormalize data, so that there will be only one index based on Packages containing (duplicated) all the data from Room and Hotel and then use Field Collapsing on Hotel ID for search results and faceting too. This would enable also direct search for Packages or Rooms but I am not sure about Field Collapsing which is still a kind of beta functionality and about potential performance costs. Can anybody give me some advice or share their experiences? Thanks a lot Wenca
stream.url problem
hi all, i am indexing the documents to solr that are in my system. now i need to index the files that are in remote system, i enabled the remote streaming to true in solrconfig.xml and when i use the stream.url it shows the error as connection refused and the detail of the error is::: when i sent the request in my browser as:: http://localhost:8080/solr/update/extract?stream.url=http://remotehost/home/san/Desktop/programming_erlang_armstrong.pdfliteral.id=schb2 i get the error as HTTP Status 500 - Connection refused java.net.ConnectException: Connection refused at sun.reflect.GeneratedConstructorAccessor11.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1368) at java.security.AccessController.doPrivileged(Native Method) at sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1362) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1016) at org.apache.solr.common.util.ContentStreamBase$URLStream.getStream(ContentStreamBase.java:88) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:161) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:237) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:852) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:619) Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) at java.net.Socket.connect(Socket.java:525) at java.net.Socket.connect(Socket.java:475) at sun.net.NetworkClient.doConnect(NetworkClient.java:163) at sun.net.www.http.HttpClient.openServer(HttpClient.java:394) at sun.net.www.http.HttpClient.openServer(HttpClient.java:529) at sun.net.www.http.HttpClient.init(HttpClient.java:233) at sun.net.www.http.HttpClient.New(HttpClient.java:306) at sun.net.www.http.HttpClient.New(HttpClient.java:323) at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:860) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:801) at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:726) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1049) at sun.net.www.protocol.http.HttpURLConnection.getHeaderField(HttpURLConnection.java:2173) at java.net.URLConnection.getContentType(URLConnection.java:485) at org.apache.solr.common.util.ContentStreamBase$URLStream.init(ContentStreamBase.java:81) at org.apache.solr.servlet.SolrRequestParsers.buildRequestFrom(SolrRequestParsers.java:136) at org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:116) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:225) ... if any body know please help me with this regards, satya
Re: stream.url problem
hi all, i am indexing the documents to solr that are in my system. now i need to index the files that are in remote system, i enabled the remote streaming to true in solrconfig.xml and when i use the stream.url it shows the error as connection refused and the detail of the error is::: when i sent the request in my browser as:: http://localhost:8080/solr/update/extract?stream.url=http://remotehost/home/san/Desktop/programming_erlang_armstrong.pdfliteral.id=schb2 You probably use the wrong port. Try 8983 instead. /Tim
Re: stream.url problem
If the connector port number in your localhost is same as in other system then this error is probable..You can change port number in server.xml of your system or other system and make them different...If it is different only then one other probablity is remote access enabled or not... Rajani Maski 2010/8/17 Tim Terlegård tim.terleg...@gmail.com hi all, i am indexing the documents to solr that are in my system. now i need to index the files that are in remote system, i enabled the remote streaming to true in solrconfig.xml and when i use the stream.url it shows the error as connection refused and the detail of the error is::: when i sent the request in my browser as:: http://localhost:8080/solr/update/extract?stream.url=http://remotehost/home/san/Desktop/programming_erlang_armstrong.pdfliteral.id=schb2 You probably use the wrong port. Try 8983 instead. /Tim
Re: OutOfMemoryErrors
Is there a way to verify that I have added correctlly? on linux you can do ps -elf | grep Boot and see if the java command has the parameters added. @all: why and when do you get those OOMs? while querying? which queries in detail? Regards, Peter.
Re: OutOfMemoryErrors
I am getting it while indexing data to solr not while querying... Though I have enough memory space upto 40GB and I my indexing data is just 5-6 GB yet that particular error is seldom observed... (SEVERE ERROR : JAVA HEAP SPACE , OUT OF MEMORY ERROR ) I could see one lock file generated in the data/index path just after this error. On Tue, Aug 17, 2010 at 4:49 PM, Peter Karich peat...@yahoo.de wrote: Is there a way to verify that I have added correctlly? on linux you can do ps -elf | grep Boot and see if the java command has the parameters added. @all: why and when do you get those OOMs? while querying? which queries in detail? Regards, Peter.
Re: Search document design problem
Hi Wenca, I am not sure wether my information here is really helpful for you, sorry if not ;-) I want only hotels that have room with 2 beds and the room has a package with all inclusive boarding and price lower than 400. you should tell us what you want to search and filter? Do you want only available or all beds/rooms of a hotel? The requirements seems to be a bit tricky but a combination of dynamic fields and the collapse feature could do it (with only one query). In your case I would start indexing the hotels like: name: hilton country: USA city: New York beds_i (multivalued): 2 | 1 | 1 | ... rooms_i: 123 ... I am not sure how I would handle the booking/prices. Maybe you will have to add an additional dynamic field free_beds_periodX_i or price_periodX_i which reports the free beds or prices for a specific period? (where one period could be a week or even a day ...) For the other searches I would create another index although it is possible to put all the data in one index and e.g. add a 'type' field to each document. With that field you can than append a filter query to each query: q=xyfq=type:hotel or type:room I would prefer this trick over the collapse feature (if you really want to setup only one index) at the beginning and see if this could work for you. (the collapse feature is not that mature like the the rest of solr, but in some situations it works nicely.) Hopes this helps a bit to get started. (Regarding the 'Data Import Handler' I cannot help, sorry) Regards, Peter. Hi all, I would like to use Solr to replace our site search based on MySQL but I am not sure how to map entities into the search index. The model is described byt the attached UML class diagram. I have a Hotel that resides in some City in some Country. The hotel has various Rooms. For each Room in a Hotel there are some Packages that can be purchased by the client. The entity returned from the search will be mainly the Hotel. E.g.: - all hotels in USA - all hotels in New York - all hotels with name containing Hilton - all hotels in Egypt with packages with all inclusive boarding and price lower than 400 and startDate between 2010-08-20 and 2010-08-30 Our application also uses faceting a lot. e.g: - # of hotels per country/city - # of hotels based on room size (# of beds - 1 bed - 100 hotels, 2 beds - 200 hotels, ...) - # of hotels based on all inclusive package prices (0-100 EUR, 100-200 EUR, ...) But there are also use cases when a search should return a Room or Package directly. I'd like to use Data Import Handler to index directly from our database. But which approach of mapping entities into the search index to use? It seems to me that there are at least 2 ways. 1) One index based on Hotel with multivalued fields for Rooms and multivalued fields for Packages. In DIH: document entity name=hotel ... field name=id .../ entity name=room ... field name=room_id .../ entity name=package... field .../ /entity /entity /entity /document But I am not sure whether this will work due to multivalued fields. The queries may span accross all the entities - I want only hotels that have room with 2 beds and the room has a package with all inclusive boarding and price lower than 400. 2) Denormalize data, so that there will be only one index based on Packages containing (duplicated) all the data from Room and Hotel and then use Field Collapsing on Hotel ID for search results and faceting too. This would enable also direct search for Packages or Rooms but I am not sure about Field Collapsing which is still a kind of beta functionality and about potential performance costs. Can anybody give me some advice or share their experiences? Thanks a lot Wenca
Re: stream.url problem
Connection refused (in any context) almost always means that nothing is listening on the TCP port that you are trying to connect to. So either the process you are connecting to isn't running, or you are trying to connect to the wrong port. On Tue, Aug 17, 2010 at 6:18 AM, satya swaroop sswaro...@gmail.com wrote: hi all, i am indexing the documents to solr that are in my system. now i need to index the files that are in remote system, i enabled the remote streaming to true in solrconfig.xml and when i use the stream.url it shows the error as connection refused and the detail of the error is::: when i sent the request in my browser as:: http://localhost:8080/solr/update/extract?stream.url=http://remotehost/home/san/Desktop/programming_erlang_armstrong.pdfliteral.id=schb2 i get the error as HTTP Status 500 - Connection refused java.net.ConnectException: Connection refused at sun.reflect.GeneratedConstructorAccessor11.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at [snip] if any body know please help me with this regards, satya
synonyms in EmbeddedSolrServer
Synonyms doesn't seem to work in EmbeddedSolrServer (solr 1.4.0) when mixing in multi word synonyms. It works fine when I run solr standalone. Did anyone else experience this? I have this in synonyms.txt: word = some, other stuff I index some and then search for word. With a standalone solr I get a hit, but not when using the exact same configuration files with EmbeddedSolrServer. If there is a multi word synonym on the same line it doesn't work. If I remove other stuff from synonyms.txt I get a hit also with EmbeddedSolrServer. /Tim
Re: Search document design problem
Hi Peter, in fact I mainly want to search Hotels by any combination of its fields and its rooms and packages. Users can setup any combination in a dynamic form that changes after every change of the query. But maybe the right term for this use case is filter not a search. The form enables fitering of hotels by predefined set of properties. And also it allows to specify it with some text search - so probably only in this case I should use the term search. To illustrate it I will describe a simple search (or filter?) form example: Message: now available 1000 hotels Select country: combo with countries Select place: combo with places Select start date: combo with available dates Select boarding: combo with available boarding types Select price: combo with available prices (0-100, 100-200) Select room size: combo with available bed counts At first the form is populated by all available values. But when user selects a country, all other fields are repopulated with values available only for hotels in the selected country and the label above the form changes the number of available hotels acording to the search criteria. Afterwards the user selects a price level and again number of available hotels changes and the other combo boxes are repopulated with available values according to selected country and price level. If I am not wrong, this is the place for Faceting, isn't it? When the user is done with the search form, he can display the matching hotels and afterwards continue with displaying matching rooms and packages in a particular hotel. In fact I am always looking for a concrete Package (offer, booking) that matches all the given criteria, but on the search output there is always only the hotel containing the given Package as well for faceting counts, no matter whether there is 1 or 50 packages matching the query, the hotel is counted only once as well as it is listed only once in the results. The Faceting is the thing I am really interested in becase the described functionality implemented on top of RDBMS results in multiple SQL queries joining multiple tables each time a user changes the search criteria, so the load of DB servers is quite high. I would like to use Solr for the search form because after some test the Lucene seams to be really very fast and I believe it would improve the response time as well as the thruput of our search. I am new to Solr so excuse me if I don't use the right terminology yet, but I hope that my description of the use case is quite clear now. ;-) Thanks Wenca Dne 17.8.2010 13:46, Peter Karich napsal(a): Hi Wenca, I am not sure wether my information here is really helpful for you, sorry if not ;-) I want only hotels that have room with 2 beds and the room has a package with all inclusive boarding and price lower than 400. you should tell us what you want to search and filter? Do you want only available or all beds/rooms of a hotel? The requirements seems to be a bit tricky but a combination of dynamic fields and the collapse feature could do it (with only one query). In your case I would start indexing the hotels like: name: hilton country: USA city: New York beds_i (multivalued): 2 | 1 | 1 | ... rooms_i: 123 ... I am not sure how I would handle the booking/prices. Maybe you will have to add an additional dynamic field free_beds_periodX_i or price_periodX_i which reports the free beds or prices for a specific period? (where one period could be a week or even a day ...) For the other searches I would create another index although it is possible to put all the data in one index and e.g. add a 'type' field to each document. With that field you can than append a filter query to each query: q=xyfq=type:hotel or type:room I would prefer this trick over the collapse feature (if you really want to setup only one index) at the beginning and see if this could work for you. (the collapse feature is not that mature like the the rest of solr, but in some situations it works nicely.) Hopes this helps a bit to get started. (Regarding the 'Data Import Handler' I cannot help, sorry) Regards, Peter.
Re: OutOfMemoryErrors
Which method do you use to index? If you are using SolrJ you can use the streaming update server it is a better option for the solr server, because the server does not need to held it all in memory. (if you are using the post.jar file there was bug which causes OOMs but I didn't remember exactly ...) then, when the solr server crashed, it cannot remove the lock file. if you are sure there is only one indexing process you could specify remove indexDefaults lockTypenative/lockType !-- or any other possible too ?? -- unlockOnStartuptrue/unlockOnStartup /indexDefaults see also: - http://www.mail-archive.com/solr-user@lucene.apache.org/msg08049.html - Out Of Memory Errors on http://wiki.apache.org/solr/SolrPerformanceFactors Regards, Peter. I am getting it while indexing data to solr not while querying... Though I have enough memory space upto 40GB and I my indexing data is just 5-6 GB yet that particular error is seldom observed... (SEVERE ERROR : JAVA HEAP SPACE , OUT OF MEMORY ERROR ) I could see one lock file generated in the data/index path just after this error. On Tue, Aug 17, 2010 at 4:49 PM, Peter Karich peat...@yahoo.de wrote: Is there a way to verify that I have added correctlly? on linux you can do ps -elf | grep Boot and see if the java command has the parameters added. @all: why and when do you get those OOMs? while querying? which queries in detail? Regards, Peter.
Re: indexing???
On Aug 16, 2010, at 10:38pm, satya swaroop wrote: hi all, the error i got is Unexpected RuntimeException from org.apache.tika.parser.pdf.pdfpar...@8210fc when i indexed a file similar to the one in https://issues.apache.org/jira/browse/PDFBOX-709/samplerequestform.pdf 1. This URL doesn't work for me. 2. Please include the full stack trace from the RuntimeException. 3. What version of Tika are you using? Thanks, -- Ken Ken Krugler +1 530-210-6378 http://bixolabs.com e l a s t i c w e b m i n i n g
Re: Search document design problem
Hi Wenca, But maybe the right term for this use case is filter not a search ... in this case I should use the term search. no problem. a search query to solr could also contain filters. so 'search' is ok for all situations, I think ;-) Afterwards the user selects a price level and again number of available hotels changes every hotel can have a price range, right? or distinct values? for a price range you can use a field price_max and price_min for every hotel. for distinct values (if not too much) you could use a multivalued price field. If I am not wrong, this is the place for Faceting, isn't it? yes, you can use facets to show only a subset of your combo values (i.e. show facets with count 0). But for every change to the combo boxes you will need to fire a query and change the hotel numbers. Facets then show how many hotels will be available if one would apply the specific filter query. You could even enhance the combo fields with the available hotelnumber... for a better user experience. So, simple try to get this basic search done (which shouldn't take too much time) and ask if you have problems. If you are convinced from solrs' relevance and performance (which I bet on) you can think about the other searches. In fact I am always looking for a concrete Package (offer, booking) that matches all the given criteria, but on the search output there is always only the hotel containing the given Package as well for faceting counts, no matter whether there is 1 or 50 packages matching the query, the hotel is counted only once as well as it is listed only once in the results. This sounds to me that you have to use only one hotel-index, where you add a multivalued field 'package' directly to every hotels. Regards, Peter. Hi Peter, in fact I mainly want to search Hotels by any combination of its fields and its rooms and packages. Users can setup any combination in a dynamic form that changes after every change of the query. But maybe the right term for this use case is filter not a search. The form enables fitering of hotels by predefined set of properties. And also it allows to specify it with some text search - so probably only in this case I should use the term search. To illustrate it I will describe a simple search (or filter?) form example: Message: now available 1000 hotels Select country: combo with countries Select place: combo with places Select start date: combo with available dates Select boarding: combo with available boarding types Select price: combo with available prices (0-100, 100-200) Select room size: combo with available bed counts At first the form is populated by all available values. But when user selects a country, all other fields are repopulated with values available only for hotels in the selected country and the label above the form changes the number of available hotels acording to the search criteria. Afterwards the user selects a price level and again number of available hotels changes and the other combo boxes are repopulated with available values according to selected country and price level. If I am not wrong, this is the place for Faceting, isn't it? When the user is done with the search form, he can display the matching hotels and afterwards continue with displaying matching rooms and packages in a particular hotel. In fact I am always looking for a concrete Package (offer, booking) that matches all the given criteria, but on the search output there is always only the hotel containing the given Package as well for faceting counts, no matter whether there is 1 or 50 packages matching the query, the hotel is counted only once as well as it is listed only once in the results. The Faceting is the thing I am really interested in becase the described functionality implemented on top of RDBMS results in multiple SQL queries joining multiple tables each time a user changes the search criteria, so the load of DB servers is quite high. I would like to use Solr for the search form because after some test the Lucene seams to be really very fast and I believe it would improve the response time as well as the thruput of our search. I am new to Solr so excuse me if I don't use the right terminology yet, but I hope that my description of the use case is quite clear now. ;-) Thanks Wenca Dne 17.8.2010 13:46, Peter Karich napsal(a): Hi Wenca, I am not sure wether my information here is really helpful for you, sorry if not ;-) I want only hotels that have room with 2 beds and the room has a package with all inclusive boarding and price lower than 400. you should tell us what you want to search and filter? Do you want only available or all beds/rooms of a hotel? The requirements seems to be a bit tricky but a combination of dynamic fields and the collapse feature could do it (with only one query). In your case I would start indexing the hotels like: name: hilton country: USA city: New York
autocomplete: case-insensitive and middle word
I have a couple questions about implementing an autocomplete function in solr. Here's my scenario: I have a name field that usually contains two or three names. For instance, let's suppose it contains: John Alfred Smith Alfred Johnson John Quincy Adams Fred Jones I'd like to have the autocomplete be case insensitive and match any of the names, preferably just at the beginning. In other words, if the user types alf, I want John Alfred Smith Alfred Johnson if the user types fre, I want Fred Jones but not: John Alfred Smith Alfred Johnson I can get the matches using the text_lu analyzer, but the hints that are returned are lower case, and only one name. If I use the string analyzer, I get the entire name like I want it, but the user must match the case, that is, must type Alf, and it only matches the first name, not the middle name. How can I get the matches of the text_lu analyzer, but get the hints like the string analyzer? Thanks, Paul
Re: autocomplete: case-insensitive and middle word
This thread might help - http://www.lucidimagination.com/search/document/9edc01a90a195336/enhancing_auto_complete Cheers Avlesh @avlesh http://twitter.com/avlesh | http://webklipper.com On Tue, Aug 17, 2010 at 8:30 PM, Paul p...@nines.org wrote: I have a couple questions about implementing an autocomplete function in solr. Here's my scenario: I have a name field that usually contains two or three names. For instance, let's suppose it contains: John Alfred Smith Alfred Johnson John Quincy Adams Fred Jones I'd like to have the autocomplete be case insensitive and match any of the names, preferably just at the beginning. In other words, if the user types alf, I want John Alfred Smith Alfred Johnson if the user types fre, I want Fred Jones but not: John Alfred Smith Alfred Johnson I can get the matches using the text_lu analyzer, but the hints that are returned are lower case, and only one name. If I use the string analyzer, I get the entire name like I want it, but the user must match the case, that is, must type Alf, and it only matches the first name, not the middle name. How can I get the matches of the text_lu analyzer, but get the hints like the string analyzer? Thanks, Paul
Re: Solr-HOW TO HANDLE THE LOCK FILE CREATION WHILE INDEXING AND OPERATION TIMED OUT WEB EXCEPTION ERROR
It would help a lot if you included the stack trace of the exception, perhaps it'll be in your SOLR logs. Also, what is your environment? Are you using any kind of networked drive for your index? Windows? What version of SOLR are you using? Anything else you think would be useful. Best Erick On Tue, Aug 17, 2010 at 12:10 AM, rajini maski rajinima...@gmail.comwrote: Hello Everyone, Please help me knowing the logic behind this lock file generation while indexing data in solr! The trouble I am facing is as follows: The data that I indexed is nearly in millions. At the initial level of indexing I find no errors unless it cross up-to 10lacs documents...But once it crosses this limit its throwing the web exception error as operation time out! And simultaneously a kind of LOCK file is generated in //data/index folder. I found in one thread ( this thread http://www.mail-archive.com/solr-user@lucene.apache.org/msg06782.html )that it can be fixed by making some changes in Config xml of solr and also by increasing java memory space in Tomcat.And I did that...Still the issue is not solved and i couldn't find any route cause for this error.. Please , whoever know logic behind these two issues i.e, 1) The web exception error as *operation timed out * 2) The logic behind* why lock files are created and how they actually work like!!* Awaiting replies Regards, Rajani Maski
Function query to boost scores by a constant if all terms are present
Let me describe what I'm trying to accomplish, first, since what I think is the solution is almost always wrong. :-) I'm doing dismax queries with mm set such that not all terms need to match, e.g. only 2 of 3 query terms need to match. Most of the time, items that match all three terms will float to the top by normal ranking, but sometimes there are only two terms that are like a rash across the record, and they end up with a higher score than some items that match all three query terms. I'd like to boost items with all the query terms to the top *without changing their order*. My first thought was to use a simple boost query allfields:(a AND b AND c), but the order of the set of records that contain all three terms changes when I do that. What I *think* I need to do is basically to say, Hey, all the items with all three terms get an extra 40,000 points, but change nothing else. I keep thinking I can get what I need with a subquery and map, but keep failing. Any advice would be very, very welcome. -Bill- -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: Solr date NOW - format?
On 4/9/2010 7:35 PM, Lance Norskog wrote: Function queries are notoriously slow. Another way to boost by year is with range queries: [NOW-6MONTHS TO NOW]^5.0 , [NOW-1YEARS TO NOW-6MONTHS]^3.0 [NOW-2YEARS TO NOW-1YEARS]^2.0 [* TO NOW-2YEARS]^1.0 Notice that you get to have a non-linear curve when you select the ranges by hand. Lance, I have worked out my major issue and now have my post date in Solr as a tdate field named pd. I cannot however figure out how to actually send a query with a date boost like you've mentioned above. I'd like to embed it right into the dismax handler definition, but it would be good to also know how to send it in a query myself. Can you help? Are the boosts indicated above a multiplier, or an addition? Thanks, Shawn
Re: OutOfMemoryErrors
You shouldn't be getting this error at all unless you're doing something out of the ordinary. So, it'd help if you told us: What parameters you have set for merging What parameters you have set for the JVM What kind of documents are you indexing? The memory you have is irrelevant if you only allocate a small portion of it for the running process... Best Erick On Tue, Aug 17, 2010 at 7:35 AM, rajini maski rajinima...@gmail.com wrote: I am getting it while indexing data to solr not while querying... Though I have enough memory space upto 40GB and I my indexing data is just 5-6 GB yet that particular error is seldom observed... (SEVERE ERROR : JAVA HEAP SPACE , OUT OF MEMORY ERROR ) I could see one lock file generated in the data/index path just after this error. On Tue, Aug 17, 2010 at 4:49 PM, Peter Karich peat...@yahoo.de wrote: Is there a way to verify that I have added correctlly? on linux you can do ps -elf | grep Boot and see if the java command has the parameters added. @all: why and when do you get those OOMs? while querying? which queries in detail? Regards, Peter.
Re: OutOfMemoryErrors
mergefactor100 /mergefactor JVM Initial memory pool -256MB Maximum memory pool -1024MB add doc fieldlong:ID/field fieldstr:Body/field 12 fields /filed /doc /add I have a solr instance in solr folder (D:/Solr) free space in disc is 24.3GB .. How will I get to know what portion of memory is solr using ? On Tue, Aug 17, 2010 at 10:11 PM, Erick Erickson erickerick...@gmail.comwrote: You shouldn't be getting this error at all unless you're doing something out of the ordinary. So, it'd help if you told us: What parameters you have set for merging What parameters you have set for the JVM What kind of documents are you indexing? The memory you have is irrelevant if you only allocate a small portion of it for the running process... Best Erick On Tue, Aug 17, 2010 at 7:35 AM, rajini maski rajinima...@gmail.com wrote: I am getting it while indexing data to solr not while querying... Though I have enough memory space upto 40GB and I my indexing data is just 5-6 GB yet that particular error is seldom observed... (SEVERE ERROR : JAVA HEAP SPACE , OUT OF MEMORY ERROR ) I could see one lock file generated in the data/index path just after this error. On Tue, Aug 17, 2010 at 4:49 PM, Peter Karich peat...@yahoo.de wrote: Is there a way to verify that I have added correctlly? on linux you can do ps -elf | grep Boot and see if the java command has the parameters added. @all: why and when do you get those OOMs? while querying? which queries in detail? Regards, Peter.
Re: Solr-HOW TO HANDLE THE LOCK FILE CREATION WHILE INDEXING AND OPERATION TIMED OUT WEB EXCEPTION ERROR
Yes it is netwoked kind and in WindowsSolr version is Solr-1.4.0 , Tomcat 6. Exception is system.net.web exception error Operation has timed out httprequest.getresponse failed For web exception error do I need to change ramBufferSize paramter and merge factors parameters in config.xml ?? And for lock file is there any setting I need to make? Why and how does it get generated...?If you know please brief it...I am not able to get it understand Thanks a lot for reply... Regards, Rajani Maski On Tue, Aug 17, 2010 at 9:41 PM, Erick Erickson erickerick...@gmail.comwrote: It would help a lot if you included the stack trace of the exception, perhaps it'll be in your SOLR logs. Also, what is your environment? Are you using any kind of networked drive for your index? Windows? What version of SOLR are you using? Anything else you think would be useful. Best Erick On Tue, Aug 17, 2010 at 12:10 AM, rajini maski rajinima...@gmail.com wrote: Hello Everyone, Please help me knowing the logic behind this lock file generation while indexing data in solr! The trouble I am facing is as follows: The data that I indexed is nearly in millions. At the initial level of indexing I find no errors unless it cross up-to 10lacs documents...But once it crosses this limit its throwing the web exception error as operation time out! And simultaneously a kind of LOCK file is generated in //data/index folder. I found in one thread ( this thread http://www.mail-archive.com/solr-user@lucene.apache.org/msg06782.html )that it can be fixed by making some changes in Config xml of solr and also by increasing java memory space in Tomcat.And I did that...Still the issue is not solved and i couldn't find any route cause for this error.. Please , whoever know logic behind these two issues i.e, 1) The web exception error as *operation timed out * 2) The logic behind* why lock files are created and how they actually work like!!* Awaiting replies Regards, Rajani Maski
Re: OutOfMemoryErrors
There are more merge paramaters, what values do you have for these: mergeFactor10/mergeFactor maxBufferedDocs1000/maxBufferedDocs maxMergeDocs2147483647/maxMergeDocs maxFieldLength1/maxFieldLength See: http://wiki.apache.org/solr/SolrConfigXml Hope that formatting comes through the various mail programs OK Also, what else happens while you're indexing? Do you search while indexing? How often do you commit your changes? On Tue, Aug 17, 2010 at 1:18 PM, rajini maski rajinima...@gmail.com wrote: mergefactor100 /mergefactor JVM Initial memory pool -256MB Maximum memory pool -1024MB add doc fieldlong:ID/field fieldstr:Body/field 12 fields /filed /doc /add I have a solr instance in solr folder (D:/Solr) free space in disc is 24.3GB .. How will I get to know what portion of memory is solr using ? On Tue, Aug 17, 2010 at 10:11 PM, Erick Erickson erickerick...@gmail.com wrote: You shouldn't be getting this error at all unless you're doing something out of the ordinary. So, it'd help if you told us: What parameters you have set for merging What parameters you have set for the JVM What kind of documents are you indexing? The memory you have is irrelevant if you only allocate a small portion of it for the running process... Best Erick On Tue, Aug 17, 2010 at 7:35 AM, rajini maski rajinima...@gmail.com wrote: I am getting it while indexing data to solr not while querying... Though I have enough memory space upto 40GB and I my indexing data is just 5-6 GB yet that particular error is seldom observed... (SEVERE ERROR : JAVA HEAP SPACE , OUT OF MEMORY ERROR ) I could see one lock file generated in the data/index path just after this error. On Tue, Aug 17, 2010 at 4:49 PM, Peter Karich peat...@yahoo.de wrote: Is there a way to verify that I have added correctlly? on linux you can do ps -elf | grep Boot and see if the java command has the parameters added. @all: why and when do you get those OOMs? while querying? which queries in detail? Regards, Peter.
Re: OutOfMemoryErrors
yeah sorry I forgot to mention others... mergeFactor100/mergeFactor maxBufferedDocs1000/maxBufferedDocs maxMergeDocs10/maxMergeDocs maxFieldLength1/maxFieldLength above are the values Is this because of values here...initially I had mergeFactor parameter -10 and maxMergedocs-1With the same error i changed them to above values..Yet I got that error after index was about 2lacs docs... On Tue, Aug 17, 2010 at 11:04 PM, Erick Erickson erickerick...@gmail.comwrote: There are more merge paramaters, what values do you have for these: mergeFactor10/mergeFactor maxBufferedDocs1000/maxBufferedDocs maxMergeDocs2147483647/maxMergeDocs maxFieldLength1/maxFieldLength See: http://wiki.apache.org/solr/SolrConfigXml Hope that formatting comes through the various mail programs OK Also, what else happens while you're indexing? Do you search while indexing? How often do you commit your changes? On Tue, Aug 17, 2010 at 1:18 PM, rajini maski rajinima...@gmail.com wrote: mergefactor100 /mergefactor JVM Initial memory pool -256MB Maximum memory pool -1024MB add doc fieldlong:ID/field fieldstr:Body/field 12 fields /filed /doc /add I have a solr instance in solr folder (D:/Solr) free space in disc is 24.3GB .. How will I get to know what portion of memory is solr using ? On Tue, Aug 17, 2010 at 10:11 PM, Erick Erickson erickerick...@gmail.com wrote: You shouldn't be getting this error at all unless you're doing something out of the ordinary. So, it'd help if you told us: What parameters you have set for merging What parameters you have set for the JVM What kind of documents are you indexing? The memory you have is irrelevant if you only allocate a small portion of it for the running process... Best Erick On Tue, Aug 17, 2010 at 7:35 AM, rajini maski rajinima...@gmail.com wrote: I am getting it while indexing data to solr not while querying... Though I have enough memory space upto 40GB and I my indexing data is just 5-6 GB yet that particular error is seldom observed... (SEVERE ERROR : JAVA HEAP SPACE , OUT OF MEMORY ERROR ) I could see one lock file generated in the data/index path just after this error. On Tue, Aug 17, 2010 at 4:49 PM, Peter Karich peat...@yahoo.de wrote: Is there a way to verify that I have added correctlly? on linux you can do ps -elf | grep Boot and see if the java command has the parameters added. @all: why and when do you get those OOMs? while querying? which queries in detail? Regards, Peter.
RE: Solr synonyms format query time vs index time
Hi Michael, I think the problem you're seeing is that no document contains reebox, and you've used the explicit syntax (source=dest) instead of the equivalent syntax (term,term,term). I'm guessing that if you convert your synonym file from: reebox = Reebok to: reebox, Reebok and leave expand=true, and then reindex, everything will work: your indexed documents containing Reebok will be made to include reebox, so queries for reebox will produce hits on those documents. Steve -Original Message- From: mtdowling [mailto:mtdowl...@gmail.com] Sent: Tuesday, August 17, 2010 2:24 PM To: solr-user@lucene.apache.org Subject: Solr synonyms format query time vs index time My company recently started using Solr for site search and autocomplete. It's working great, but we're running into a problem with synonyms. We are generating a synonyms.txt file from a database table and using that synonyms.txt file at index time on a text type field. Here's an excerpt from the synonyms file: reebox = Reebok shinguards = Shin Guards shirt = T-Shirt,Shirt shmak = Shmack shocks = shox skateboard = Skate skateboarding = Skate skater = Skate skates = Skate skating = Skate skirt = Dresses When we do a search for reebox, we want the term to be mapped to Reebok through explicit mapping, but for some reason this isn't happening. We do have multi-word synonyms, and from what I've read on the mailing list, those only work at index time, so we are only using the synonym filter factory at index time: fieldType name=search class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType Here's more relevant schema.xml configs: field name=mashup type=search indexed=true stored=false multiValued=true/ copyField source=keywords dest=mashup/ copyField source=category dest=mashup/ copyField source=name dest=mashup/ copyField source=brand dest=mashup/ copyField source=description_overview dest=mashup/ copyField source=sku dest=mashup/ !-- other copy fields... -- The output of the query analyzer shows the following: Query Analyzer org.apache.solr.analysis.WhitespaceTokenizerFactory {} term position 1 term text reebox term type word source start,end 0,6 payload org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt, ignoreCase=true} term position 1 term text reebox term type word source start,end 0,6 payload org.apache.solr.analysis.WordDelimiterFilterFactory {generateNumberParts=0, catenateWords=1, generateWordParts=0, catenateAll=0, catenateNumbers=1} term position 1 term text reebox term type word source start,end 0,6 payload org.apache.solr.analysis.LowerCaseFilterFactory {} term position 1 term text reebox term type word source start,end 0,6 payload org.apache.solr.analysis.SnowballPorterFilterFactory {protected=protwords.txt, language=English} term position 1 term text reebox term type word source start,end 0,6 payload org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {} term position 1 term text reebox term type word source start,end 0,6 payload So reebox is never being converted to Reebok. I thought that if I had index time synonyms with expansion configured that I wouldn't need query time synonyms. Maybe my dynamic synonyms generation isn't formatted correctly for my desired result? If I use the same synonyms.txt file and use the index analyzer, reebox is mapped to Reebok and then indexed correctly: Index Analyzer
Re: OutOfMemoryErrors
A merge factor of 100 is very high and out of the norm. Try starting with a value of 10. I've never seen a running system with a value anywhere near this high. Also, what is your setting for ramBufferSizeMB? -Jay On Tue, Aug 17, 2010 at 10:46 AM, rajini maski rajinima...@gmail.comwrote: yeah sorry I forgot to mention others... mergeFactor100/mergeFactor maxBufferedDocs1000/maxBufferedDocs maxMergeDocs10/maxMergeDocs maxFieldLength1/maxFieldLength above are the values Is this because of values here...initially I had mergeFactor parameter -10 and maxMergedocs-1With the same error i changed them to above values..Yet I got that error after index was about 2lacs docs... On Tue, Aug 17, 2010 at 11:04 PM, Erick Erickson erickerick...@gmail.com wrote: There are more merge paramaters, what values do you have for these: mergeFactor10/mergeFactor maxBufferedDocs1000/maxBufferedDocs maxMergeDocs2147483647/maxMergeDocs maxFieldLength1/maxFieldLength See: http://wiki.apache.org/solr/SolrConfigXml Hope that formatting comes through the various mail programs OK Also, what else happens while you're indexing? Do you search while indexing? How often do you commit your changes? On Tue, Aug 17, 2010 at 1:18 PM, rajini maski rajinima...@gmail.com wrote: mergefactor100 /mergefactor JVM Initial memory pool -256MB Maximum memory pool -1024MB add doc fieldlong:ID/field fieldstr:Body/field 12 fields /filed /doc /add I have a solr instance in solr folder (D:/Solr) free space in disc is 24.3GB .. How will I get to know what portion of memory is solr using ? On Tue, Aug 17, 2010 at 10:11 PM, Erick Erickson erickerick...@gmail.com wrote: You shouldn't be getting this error at all unless you're doing something out of the ordinary. So, it'd help if you told us: What parameters you have set for merging What parameters you have set for the JVM What kind of documents are you indexing? The memory you have is irrelevant if you only allocate a small portion of it for the running process... Best Erick On Tue, Aug 17, 2010 at 7:35 AM, rajini maski rajinima...@gmail.com wrote: I am getting it while indexing data to solr not while querying... Though I have enough memory space upto 40GB and I my indexing data is just 5-6 GB yet that particular error is seldom observed... (SEVERE ERROR : JAVA HEAP SPACE , OUT OF MEMORY ERROR ) I could see one lock file generated in the data/index path just after this error. On Tue, Aug 17, 2010 at 4:49 PM, Peter Karich peat...@yahoo.de wrote: Is there a way to verify that I have added correctlly? on linux you can do ps -elf | grep Boot and see if the java command has the parameters added. @all: why and when do you get those OOMs? while querying? which queries in detail? Regards, Peter.
Sort by date, filter by score?
I have had a request from our development team. I did some searching and could not find an answer. They want to sort by a date field but filter out all results below a minimum relevancy score. Is this possible? I suspect that our only option will be to do the search sorted by relevancy and then sort them ourselves. Thanks, Shawn
Re: Sort by date, filter by score?
They want to sort by a date field but filter out all results below a minimum relevancy score. Is this possible? Earlier Yonik proposed a solution to a similar need. http://search-lucene.com/m/4AHNF17wIJW1
sort order of missing items
When items are sorted, are all the docs with the sort field missing considered tied in terms of their sort order, or are they indeterminate, or do they have some arbitrary order imposed on them (e.g. _docid_)? For example, would b be considered as part of the sort in the following query, or would all the missing 'a' fields be in some kind of order already, thus making the sort algorithm never check the 'b' field? /select/?q=-a:[* TO *]sort=a asc,b asc And would sortMissingLast / sortMissingFirst affect the answer to that question? I've been seeing weird behaviour in my index with queries (a little) like this one, but I haven't pinpointed the problem yet. Brad
Re: Solr date NOW - format?
I think 'bq=' is what you want. In dismax the main query string is assumed to go against a bunch of fields. This query is in the standard (Lucene++) format. The query strings should handle the ^number syntax. http://www.lucidimagination.com/search/document/CDRG_ch07_7.4.2.9 On Tue, Aug 17, 2010 at 9:40 AM, Shawn Heisey s...@elyograg.org wrote: On 4/9/2010 7:35 PM, Lance Norskog wrote: Function queries are notoriously slow. Another way to boost by year is with range queries: [NOW-6MONTHS TO NOW]^5.0 , [NOW-1YEARS TO NOW-6MONTHS]^3.0 [NOW-2YEARS TO NOW-1YEARS]^2.0 [* TO NOW-2YEARS]^1.0 Notice that you get to have a non-linear curve when you select the ranges by hand. Lance, I have worked out my major issue and now have my post date in Solr as a tdate field named pd. I cannot however figure out how to actually send a query with a date boost like you've mentioned above. I'd like to embed it right into the dismax handler definition, but it would be good to also know how to send it in a query myself. Can you help? Are the boosts indicated above a multiplier, or an addition? Thanks, Shawn -- Lance Norskog goks...@gmail.com
Re: Solr synonyms format query time vs index time
solr/admin/analysis.jsp lets you see how this works. Use the index boxes. Lance On Tue, Aug 17, 2010 at 11:56 AM, Steven A Rowe sar...@syr.edu wrote: Hi Michael, I think the problem you're seeing is that no document contains reebox, and you've used the explicit syntax (source=dest) instead of the equivalent syntax (term,term,term). I'm guessing that if you convert your synonym file from: reebox = Reebok to: reebox, Reebok and leave expand=true, and then reindex, everything will work: your indexed documents containing Reebok will be made to include reebox, so queries for reebox will produce hits on those documents. Steve -Original Message- From: mtdowling [mailto:mtdowl...@gmail.com] Sent: Tuesday, August 17, 2010 2:24 PM To: solr-user@lucene.apache.org Subject: Solr synonyms format query time vs index time My company recently started using Solr for site search and autocomplete. It's working great, but we're running into a problem with synonyms. We are generating a synonyms.txt file from a database table and using that synonyms.txt file at index time on a text type field. Here's an excerpt from the synonyms file: reebox = Reebok shinguards = Shin Guards shirt = T-Shirt,Shirt shmak = Shmack shocks = shox skateboard = Skate skateboarding = Skate skater = Skate skates = Skate skating = Skate skirt = Dresses When we do a search for reebox, we want the term to be mapped to Reebok through explicit mapping, but for some reason this isn't happening. We do have multi-word synonyms, and from what I've read on the mailing list, those only work at index time, so we are only using the synonym filter factory at index time: fieldType name=search class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType Here's more relevant schema.xml configs: field name=mashup type=search indexed=true stored=false multiValued=true/ copyField source=keywords dest=mashup/ copyField source=category dest=mashup/ copyField source=name dest=mashup/ copyField source=brand dest=mashup/ copyField source=description_overview dest=mashup/ copyField source=sku dest=mashup/ !-- other copy fields... -- The output of the query analyzer shows the following: Query Analyzer org.apache.solr.analysis.WhitespaceTokenizerFactory {} term position 1 term text reebox term type word source start,end 0,6 payload org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt, ignoreCase=true} term position 1 term text reebox term type word source start,end 0,6 payload org.apache.solr.analysis.WordDelimiterFilterFactory {generateNumberParts=0, catenateWords=1, generateWordParts=0, catenateAll=0, catenateNumbers=1} term position 1 term text reebox term type word source start,end 0,6 payload org.apache.solr.analysis.LowerCaseFilterFactory {} term position 1 term text reebox term type word source start,end 0,6 payload org.apache.solr.analysis.SnowballPorterFilterFactory {protected=protwords.txt, language=English} term position 1 term text reebox term type word source start,end 0,6 payload org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {} term position 1 term text reebox term type word source start,end 0,6 payload So reebox is never being converted to Reebok. I thought that if I had index time synonyms with expansion configured that I wouldn't need query time synonyms. Maybe my dynamic synonyms generation isn't formatted correctly for my desired result? If I use the same synonyms.txt
Re: Function query to boost scores by a constant if all terms are present
Most of the time, items that match all three terms will float to the top by normal ranking, but sometimes there are only two terms that are like a rash across the record, and they end up with a higher score than some items that match all three query terms. I'd like to boost items with all the query terms to the top *without changing their order*. My first thought was to use a simple boost query allfields:(a AND b AND c), but the order of the set of records that contain all three terms changes when I do that. What I *think* I need to do is basically to say, Hey, all the items with all three terms get an extra 40,000 points, but change nothing else. This is a hard task, and I am not sure it is possible. But you need to change similarity algorithm for that. Final score is composed of many factors. coord, norm, tf-idf ... http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html May be you can try to customize coord(q,d). But there can be always some cases that you describe. For example very long document containing three terms will be punished due to its length. A very short document with two query terms can pop-up before it. It is easy to rank items with all three terms so that they comes first, (omitNorms=true and omitTermFreqAndPositions=true should almost do it) but change nothing else part is not. Easiest thing can be throw additional query with pure AND operator and display these result in a special way.
queryResultCache has no hits for date boost function
Hi all, my queryResultCache has no hits. But if I am removing one line from the bf section in my dismax handler all is fine. Here is the line: recip(ms(NOW,date),3.16e-11,1,1) According to http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents this should be fine. So what is the problem with this and how could I fix it? Regards, Peter. PS: the problem raised in the thread 'Improve Query Time For Large Index'.
changable DIH datasource based on environment variables
I defined my DIH datasource in solrconfig.xml. Is there a way to define two sets of data sources and use one based on the current system's environment variable?(ex. APP_ENV=production or APP_ENV=development) I run the DIH on my local machine and remote server. They use different mysql datasources for importing. -- @tommychheng Programmer and UC Irvine Graduate Student Find a great grad school based on research interests: http://gradschoolnow.com
Integrating Solr's SynonymFilter in lucene
I am trying to have multi-word synonyms work in lucene using Solr's * SynonymFilter*. I need to match synonyms at index time, since many of the synonym lists are huge. Actually they are really not synonyms, but are words that belong to a concept. For example, I would like to map {New York, Los Angeles, New Orleans, Salt Lake City...}, a bunch of city names, to the concept called city. While searching, the user query for the concept city will be translated to a keyword like, say CONCEPTcity, which is the synonym for any city name. Using lucene's SynonymAnalyzer, as explained in Lucene in Action (p. 131), all I could match for CONCEPTcity is single word city names like Chicago, Seattle, Boston, etc., It would not match multi-word city names like New York, Los Angeles, etc., I tried using Solr's SynonymFilter in tokenStream method in a custom Analyzer (that extends org.apache.lucene.analysis. Analyzer - lucene ver. 2.9.3) using: *public TokenStream tokenStream(String fieldName, Reader reader) { TokenStream result = new SynonymFilter( new WhitespaceTokenizer(reader), synonymMap); return result; } * where *synonymMap* is loaded with synonyms using *synonymMap.add(conceptTerms, listOfTokens, true, true);* where *conceptTerms* is of type *ArrayListString* of all the terms in a concept and *listofTokens* is of type *ListToken *and contains only the generic synonym identifier like *CONCEPTcity*. When I print synonymMap using synonymMap.toString(), I get the output like {New York={Chicago={Seattle={New Orleans=[(CATEGORYcity,0,0,type=SYNONYM),ORIG],null}}}} so it looks like all the synonyms are loaded. But if I search for CATEGORYcity then it says no matches found. I am not sure whether I have loaded the synonyms correctly in the synonymMap. Any help will be deeply appreciated. Thanks!
Re: Solr date NOW - format?
Would I do separate bq values for each of the ranges, or is there a way to include them all at once? If it's the latter, I'll need a full example with a field name, because I'm clueless. :) On 8/17/2010 2:29 PM, Lance Norskog wrote: I think 'bq=' is what you want. In dismax the main query string is assumed to go against a bunch of fields. This query is in the standard (Lucene++) format. The query strings should handle the ^number syntax. http://www.lucidimagination.com/search/document/CDRG_ch07_7.4.2.9 On Tue, Aug 17, 2010 at 9:40 AM, Shawn Heiseys...@elyograg.org wrote: On 4/9/2010 7:35 PM, Lance Norskog wrote: Function queries are notoriously slow. Another way to boost by year is with range queries: [NOW-6MONTHS TO NOW]^5.0 , [NOW-1YEARS TO NOW-6MONTHS]^3.0 [NOW-2YEARS TO NOW-1YEARS]^2.0 [* TO NOW-2YEARS]^1.0 Notice that you get to have a non-linear curve when you select the ranges by hand. Lance, I have worked out my major issue and now have my post date in Solr as a tdate field named pd. I cannot however figure out how to actually send a query with a date boost like you've mentioned above. I'd like to embed it right into the dismax handler definition, but it would be good to also know how to send it in a query myself. Can you help? Are the boosts indicated above a multiplier, or an addition?
Re: OutOfMemoryErrors
Yeah fine..I will do that...Before the merge Factor was 10 itself ...After finding this error I just set its value higher assuming if that could be error anyway... Will re change it.. The ramBufferSize is 256MB... Do I need to change this value to higher? On Wed, Aug 18, 2010 at 12:27 AM, Jay Hill jayallenh...@gmail.com wrote: A merge factor of 100 is very high and out of the norm. Try starting with a value of 10. I've never seen a running system with a value anywhere near this high. Also, what is your setting for ramBufferSizeMB? -Jay On Tue, Aug 17, 2010 at 10:46 AM, rajini maski rajinima...@gmail.com wrote: yeah sorry I forgot to mention others... mergeFactor100/mergeFactor maxBufferedDocs1000/maxBufferedDocs maxMergeDocs10/maxMergeDocs maxFieldLength1/maxFieldLength above are the values Is this because of values here...initially I had mergeFactor parameter -10 and maxMergedocs-1With the same error i changed them to above values..Yet I got that error after index was about 2lacs docs... On Tue, Aug 17, 2010 at 11:04 PM, Erick Erickson erickerick...@gmail.com wrote: There are more merge paramaters, what values do you have for these: mergeFactor10/mergeFactor maxBufferedDocs1000/maxBufferedDocs maxMergeDocs2147483647/maxMergeDocs maxFieldLength1/maxFieldLength See: http://wiki.apache.org/solr/SolrConfigXml Hope that formatting comes through the various mail programs OK Also, what else happens while you're indexing? Do you search while indexing? How often do you commit your changes? On Tue, Aug 17, 2010 at 1:18 PM, rajini maski rajinima...@gmail.com wrote: mergefactor100 /mergefactor JVM Initial memory pool -256MB Maximum memory pool -1024MB add doc fieldlong:ID/field fieldstr:Body/field 12 fields /filed /doc /add I have a solr instance in solr folder (D:/Solr) free space in disc is 24.3GB .. How will I get to know what portion of memory is solr using ? On Tue, Aug 17, 2010 at 10:11 PM, Erick Erickson erickerick...@gmail.com wrote: You shouldn't be getting this error at all unless you're doing something out of the ordinary. So, it'd help if you told us: What parameters you have set for merging What parameters you have set for the JVM What kind of documents are you indexing? The memory you have is irrelevant if you only allocate a small portion of it for the running process... Best Erick On Tue, Aug 17, 2010 at 7:35 AM, rajini maski rajinima...@gmail.com wrote: I am getting it while indexing data to solr not while querying... Though I have enough memory space upto 40GB and I my indexing data is just 5-6 GB yet that particular error is seldom observed... (SEVERE ERROR : JAVA HEAP SPACE , OUT OF MEMORY ERROR ) I could see one lock file generated in the data/index path just after this error. On Tue, Aug 17, 2010 at 4:49 PM, Peter Karich peat...@yahoo.de wrote: Is there a way to verify that I have added correctlly? on linux you can do ps -elf | grep Boot and see if the java command has the parameters added. @all: why and when do you get those OOMs? while querying? which queries in detail? Regards, Peter.
Re: OutOfMemoryErrors
ramBufferSize is preferred to be 128MB more than that it does not seemes to improve performance -- View this message in context: http://lucene.472066.n3.nabble.com/OutOfMemoryErrors-tp1181731p1199592.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr date NOW - format?
My first attempt was adding this to the dismax handler: str name=bqpd:[NOW-1MONTH TO NOW]^5.0/str str name=bqpd:[NOW-3MONTHS TO NOW-1MONTH]^3.0/str str name=bqpd:[NOW-1YEAR TO NOW-3MONTHS]^2.0/str str name=bqpd:[* TO NOW-1YEAR]^1.0/str This results in scores that are quite a bit lower (9.5 max score instead of 11.7), but the order looks the same. No real change other than a higher max score (10) if I leave only the first bq entry. I wasn't able to figure out a way to put all the ranges in one bq, everything I tried got zero results. What am I doing wrong? On 8/17/2010 8:36 PM, Shawn Heisey wrote: Would I do separate bq values for each of the ranges, or is there a way to include them all at once? If it's the latter, I'll need a full example with a field name, because I'm clueless. :) On 8/17/2010 2:29 PM, Lance Norskog wrote: I think 'bq=' is what you want. In dismax the main query string is assumed to go against a bunch of fields. This query is in the standard (Lucene++) format. The query strings should handle the ^number syntax. http://www.lucidimagination.com/search/document/CDRG_ch07_7.4.2.9 On Tue, Aug 17, 2010 at 9:40 AM, Shawn Heiseys...@elyograg.org wrote: On 4/9/2010 7:35 PM, Lance Norskog wrote: Function queries are notoriously slow. Another way to boost by year is with range queries: [NOW-6MONTHS TO NOW]^5.0 , [NOW-1YEARS TO NOW-6MONTHS]^3.0 [NOW-2YEARS TO NOW-1YEARS]^2.0 [* TO NOW-2YEARS]^1.0 Notice that you get to have a non-linear curve when you select the ranges by hand. Lance, I have worked out my major issue and now have my post date in Solr as a tdate field named pd. I cannot however figure out how to actually send a query with a date boost like you've mentioned above. I'd like to embed it right into the dismax handler definition, but it would be good to also know how to send it in a query myself. Can you help? Are the boosts indicated above a multiplier, or an addition?
Re: indexing???
hi, 1) i use tika 0.8... 2)the url is https://issues.apache.org/jira/browse/PDFBOX-709 and the file is samplerequestform.pdf 3)the entire error is::; curl http://localhost:8080/solr/update/extract?stream.file=/home/satya/my_workings/satya_ebooks/8-Linux/samplerequestform.pdfliteral.id=linuxc htmlheadtitleApache Tomcat/6.0.26 - Error report/titlestyle!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--/style /headbodyh1HTTP Status 500 - org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pdf.pdfpar...@1d688e2 org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pdf.pdfpar...@1d688e2 at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:214) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:237) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:852) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:619) Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pdf.pdfpar...@1d688e2 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:144) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:112) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:193) ... 18 more Caused by: java.lang.ClassCastException: org.apache.pdfbox.pdmodel.font.PDFontDescriptorAFM cannot be cast to org.apache.pdfbox.pdmodel.font.PDFontDescriptorDictionary at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.ensureFontDescriptor(PDTrueTypeFont.java:167) at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.lt;initgt;(PDTrueTypeFont.java:117) at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:140) at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:76) at org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:115) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:225) at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:207) at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:367) at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:291) at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:247) at org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:180) at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56) at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:79) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:142) ... 21 more /h1HR size=1 noshade=noshadepbtype/b Status report/ppbmessage/b uorg.apache.tika.exception.TikaException: Unexpected RuntimeException from
Fun with Spatial (Haversine formula)
The Haversine formula in o.a.s.s.f.d.DistanceUtils.java gives these results for a 0.1 degree difference in miles: equator horizontal 0.1 deg: lat/lon 0.0/0.0 - 396.320504 equator vertical 0.1 deg: lat/lon 0.0/0.0 - 396.320504 NYC horizontal 0.1 deg: lat/lon -72.0/0.0 - 383.33093669272654 NYC vertical 0.1 deg: lat/lon -72.0/0.0 - 396.3204997747 arctic horizontal 0.1 deg: lat/lon 89.0/0.0- 202.13129169290906 arctic vertical0.1 deg: lat/lon 89.0/0.0- 396.3204997747 N. Pole horizontal 0.1 deg: lat/lon 89.8/0.0- 103.61036292825034 N. Pole vertical 0.1 deg: lat/lon 89.8/0.0- 396.320500338 That is, a horizontal shift of 0.1 at the equator, New York City's latitude, 1 degree south of the North Pole and almost-almost-almost at the North Pole, these are the distances in miles. The latitude changes make perfect sense, but one would expect the longitudes to shrink as well. Here is the code, added to DistanceUtils.java. What am I doing wrong? public static void main(String[] args) { show(equator horizontal 0.1 deg, 0.0, 0.0, 0.0, 0.1); show(equator vertical 0.1 deg, 0.0, 0.0, 0.1, 0.0); show(NYC horizontal 0.1 deg, -72, 0.0, -72, 0.1); show(NYC vertical 0.1 deg, -72, 0, -72.1, 0.0); show(arctic horizontal 0.1 deg, 89.0, 0.0, 89.0, 0.1); show(arctic vertical0.1 deg, 89.0, 0.0, 89.1, 0.0); show(N. Pole horizontal 0.1 deg, 89.8, 0.0, 89.8, 0.1); show(N. Pole vertical 0.1 deg, 89.8, 0.0, 89.9, 0.0); } private static void show(String label, double d, double e, double f, double g) { System.out.println(label + : lat/lon + d + / + e + \t- + haversine(d,e,f,g, 3963.205)); } (This is from the Solr trunk.) -- Lance Norskog goks...@gmail.com