Re: Git repo
http://git.apache.org/ On Sun, Feb 19, 2012 at 7:50 PM, Mark Diggory mdigg...@atmire.com wrote: Is there a git repo location that mirrors apache svn repos for solr? Cheers, Mark -- [image: @mire Inc.] *Mark Diggory *(Schedule a Meeting https://www.google.com/calendar/selfsched?sstoken=UUdDSzJzTTlOUE1mfGRlZmF1bHR8MzgwMmEwYjk1NDc1NDQ1MGI0NWViYjYzZjExZDI3Mzg ) *2888 Loker Avenue East, Suite 305, Carlsbad, CA. 92010* *Esperantolaan 4, Heverlee 3001, Belgium* http://www.atmire.com -- Igor Milovanović https://twitter.com/#!/f13o | http://about.me/igor.milovanovic | http://umotvorine.com/
Re: Development inside or outside of Solr?
I have looked into the TikaCLI with -language option, and learned that Tika can output only the language metadata. It cannot help me to solve my problem though, as my main concern is whether to change Solr or not. Thank you all the same. -- View this message in context: http://lucene.472066.n3.nabble.com/Development-inside-or-outside-of-Solr-tp3759680p3760131.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr logging
Hi, I want to set my Solr to use log4j and to write log messages into separate file instead of writing all on standard output. How can I do it? Which jars should I add? Where should I put log4j.xml file? Regards, Alex
Re: Solr logging
I get similar questions in the past :) http://lucene.472066.n3.nabble.com/Jetty-logging-td3476715.html#a3483146 wish it will help you. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-logging-tp3760171p3760173.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr logging
Thanks a lot. I've added (and deleted) those libraries and now I don't get this messages to stdout :) I see that log4j is running and it can't find its config file. I wish I could add this to the solr.war. Is this possible? I want to avoid setting paramemeters in glassfish. Regards, Alex On Mon, Feb 20, 2012 at 9:58 AM, darul daru...@gmail.com wrote: I get similar questions in the past :) http://lucene.472066.n3.nabble.com/Jetty-logging-td3476715.html#a3483146 wish it will help you. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-logging-tp3760171p3760173.html Sent from the Solr - User mailing list archive at Nabble.com.
processing of merged tokens
Hello all, For our search system we'd like to be able to process merged tokens, i.e. when a user enters a query like hotelsin barcelona, we'd like to know that the user means hotels in barcelona. At some point in the past we implemented this kind of functionality with shingles (using ShingleFilter), that is, if we were indexing the sentence hotels in barcelona as a document, we'd be able to match at query time merged tokens like hotelsin and inbarcelona. This solution has two problems: 1) The index size increases a lot. 2) We only catch a small % of the possibilities. Merged tokens like hotelsbarcelona or barcelonahotels cannot be processed. Our intuition is that there should be a better solution. Maybe it's solved in SOLR or Lucene and we haven't found it yet. If it's not solved, I can imagine a naive solution that would use TermsEnum to identify whether a token exists in the index or not, and then if it doesn't exist, use the TermsEnum again to check whether it's a composition of two known tokens. It's highly likely that there are much better solutions and algorithms for this. It would be great if you can help us identify the best way to solve this problem. Thanks a lot for your help. Carlos Carlos Gonzalez-Cadenas CEO, ExperienceOn - New generation search http://www.experienceon.com Mobile: +34 652 911 201 Skype: carlosgonzalezcadenas LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas
Re: Solr logging
Yes, you can update your .war archive by adding/removing expected jars. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-logging-tp3760171p3760285.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr logging
I've already done that. What I'm more interested is if I can add log4j.xml to war and where to put to make it works On Mon, Feb 20, 2012 at 10:49 AM, darul daru...@gmail.com wrote: Yes, you can update your .war archive by adding/removing expected jars. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-logging-tp3760171p3760285.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr logging
Hmm, I did not try to achieve this but interested if you find a way... After I believe than having log4j config file outside war archive is a better solution, if you may need to update its content for example. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-logging-tp3760171p3760322.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr logging
Yep. I suppose it is. But I have several applications installed on glassfish and I want each one of them to write into separate file. And Your solution with this jvm option was redirecting all messages from all apps to one file. Does anyone knows how to accomplish that? On Mon, Feb 20, 2012 at 11:09 AM, darul daru...@gmail.com wrote: Hmm, I did not try to achieve this but interested if you find a way... After I believe than having log4j config file outside war archive is a better solution, if you may need to update its content for example. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-logging-tp3760171p3760322.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr logging
This case explained here: http://stackoverflow.com/questions/762918/how-to-configure-multiple-log4j-for-different-wars-in-a-single-ear http://techcrawler.wordpress.com/ -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-logging-tp3760171p3760352.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Payload and exact search - 2
Ok, it works!! Thanks you very much. Leonardo -- View this message in context: http://lucene.472066.n3.nabble.com/Payload-and-exact-search-2-tp3750355p3760477.html Sent from the Solr - User mailing list archive at Nabble.com.
solr and tika
Hi all, In a new installation of sOlr (1.4) I configured Tika for indexing rich documents. So, I commit my files and I can find it after indexing with an http query * http://localhost:8983/solr/select?q=attr_content:parola*; (for search the word 'parola') and I find the committed text. but if I search with Solr front panel, the results is '0 documents'. suggestions? thanks alessio
Re: Development inside or outside of Solr?
You could take a look at this: http://www.let.rug.nl/vannoord/TextCat/ Will probably require some work to integrate/implement through François On Feb 20, 2012, at 3:37 AM, bing wrote: I have looked into the TikaCLI with -language option, and learned that Tika can output only the language metadata. It cannot help me to solve my problem though, as my main concern is whether to change Solr or not. Thank you all the same. -- View this message in context: http://lucene.472066.n3.nabble.com/Development-inside-or-outside-of-Solr-tp3759680p3760131.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr logging
Ola Here is what I have for this: ## # # Log4J configuration for SOLR # # http://wiki.apache.org/solr/SolrLogging # # # 1) Download LOG4J: # http://logging.apache.org/log4j/1.2/ # http://logging.apache.org/log4j/1.2/download.html # http://www.apache.org/dyn/closer.cgi/logging/log4j/1.2.16/apache-log4j-1.2.16.tar.gz # http://newverhost.com/pub//logging/log4j/1.2.16/apache-log4j-1.2.16.tar.gz # # 2) Download SLF4J: # http://www.slf4j.org/ # http://www.slf4j.org/download.html # http://www.slf4j.org/dist/slf4j-1.6.4.tar.gz # # 3) Unpack Solr: # jar xvf apache-solr-3.5.0.war # # 4) Delete: # WEB-INF/lib/log4j-over-slf4j-1.6.4.jar # WEB-INF/lib/slf4j-jdk14-1.6.4.jar # # 5) Copy: # apache-log4j-1.2.16/log4j-1.2.16.jar- WEB-INF/lib # slf4j-1.6.4/slf4j-log4j12-1.6.4.jar - WEB-INF/lib # log4j.properties (this file)- WEB-INF/classes/ (needs to be created) # # 6) Pack Solr: # jar cvf apache-solr-3.4.0-omim.war admin favicon.ico index.jsp META-INF WEB-INF # # # Author: Francois Schiettecatte # Version:1.0 # ## ## # # Logging levels (helpful reminder) # # DEBUG INFO WARN ERROR FATAL # ## # # Logging setup # log4j.rootLogger=WARN, SOLR # Daily Rolling File Appender (SOLR) log4j.appender.SOLR=org.apache.log4j.DailyRollingFileAppender log4j.appender.SOLR.File=${catalina.base}/logs/solr.log log4j.appender.SOLR.Append=true log4j.appender.SOLR.Encoding=UTF-8 log4j.appender.SOLR.DatePattern='-'-MM-dd log4j.appender.SOLR.layout=org.apache.log4j.PatternLayout log4j.appender.SOLR.layout.ConversionPattern=%d [%t] %-5p %c - %m%n ## # # Logging levels for SOLR # # Default logging level log4j.logger.org.apache.solr=WARN ## On Feb 20, 2012, at 5:15 AM, ola nowak wrote: Yep. I suppose it is. But I have several applications installed on glassfish and I want each one of them to write into separate file. And Your solution with this jvm option was redirecting all messages from all apps to one file. Does anyone knows how to accomplish that? On Mon, Feb 20, 2012 at 11:09 AM, darul daru...@gmail.com wrote: Hmm, I did not try to achieve this but interested if you find a way... After I believe than having log4j config file outside war archive is a better solution, if you may need to update its content for example. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-logging-tp3760171p3760322.html Sent from the Solr - User mailing list archive at Nabble.com.
Problem with SolrCloud + Zookeeper + DataImportHandler
Hi All, I've recently downloaded latest solr trunk to configure solrcloud with zookeeper using standard configuration from wiki: http://wiki.apache.org/solr/SolrCloud. The problem occurred when I tried to configure DataImportHandler in solrconfig.xml: requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configdb-data-config.xml/str /lst /requestHandler After starting solr with zookeeper I've got errors: Feb 20, 2012 11:30:12 AM org.apache.solr.common.SolrException log SEVERE: null:org.apache.solr.common.SolrException at org.apache.solr.core.SolrCore.init(SolrCore.java:606) at org.apache.solr.core.SolrCore.init(SolrCore.java:490) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:705) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:442) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:313) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.ja va:262) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:98 ) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:71 3) at org.mortbay.jetty.servlet.Context.startContext(Context.java:140) at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:128 2) at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518) at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java :152) at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerC ollection.java:156) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java :152) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130) at org.mortbay.jetty.Server.doStart(Server.java:224) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:985) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java: 39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm pl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.mortbay.start.Main.invokeMain(Main.java:194) at org.mortbay.start.Main.start(Main.java:534) at org.mortbay.start.Main.start(Main.java:441) at org.mortbay.start.Main.main(Main.java:119) Caused by: org.apache.solr.common.SolrException: FATAL: Could not create importer. DataImporter config invalid at org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHand ler.java:120) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:542 ) at org.apache.solr.core.SolrCore.init(SolrCore.java:601) ... 31 more Caused by: org.apache.solr.common.cloud.ZooKeeperException: ZkSolrResourceLoader does not support getConfigDir() - likely, w at org.apache.solr.cloud.ZkSolrResourceLoader.getConfigDir(ZkSolrResourceLoad er.java:99) at org.apache.solr.handler.dataimport.SimplePropertiesWriter.init(SimplePrope rtiesWriter.java:47) at org.apache.solr.handler.dataimport.DataImporter.init(DataImporter.java:1 12) at org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHand ler.java:114) ... 33 more I've checked if file db-data-config.xml is available in Zookeeper: [zk: localhost:2181(CONNECTED) 0] ls /configs/conf1 [admin-extra.menu-top.html, dict, solrconfig.xml, dataimport.properties, admin-extra.html, solrconfig.xml.old, solrconfig.xml.new, solrconfig.xml~, xslt, db-data-config.xml, velocity, elevate.xml, admin-extra.menu-bottom.html, solrconfig.xml.dataimport, schema.xml] [zk: localhost:2181(CONNECTED) 1] Is it possible to configure DIH with Zookeper? And how to do it? I'm little confused with that. Regards Agnieszka Kukalowicz
Re: custom scoring
Hello all: We've done some tests with Em's approach of putting a BooleanQuery in front of our user query, that means: BooleanQuery must (DismaxQuery) should (FunctionQuery) The FunctionQuery obtains the SOLR IR score by means of a QueryValueSource, then does the SQRT of this value, and then multiplies it by our custom query_score float, pulling it by means of a FieldCacheSource. In particular, we've proceeded in the following way: - we've loaded the whole index in the page cache of the OS to make sure we don't have disk IO problems that might affect the benchmarks (our machine has enough memory to load all the index in RAM) - we've executed an out-of-benchmark query 10-20 times to make sure that everything is jitted and that Lucene's FieldCache is properly populated. - we've disabled all the caches (filter query cache, document cache, query cache) - we've executed 8 different user queries with and without FunctionQueries, with early termination in both cases (our collector stops after collecting 50 documents per shard) Em was correct, the query is much faster with the BooleanQuery in front, but it's still 30-40% slower than the query without FunctionQueries. Although one may think that it's reasonable that the query response time increases because of the extra computations, we believe that the increase is too big, given that we're collecting just 500-600 documents due to the early query termination techniques we currently use. Any ideas on how to make it faster?. Thanks a lot, Carlos Carlos Gonzalez-Cadenas CEO, ExperienceOn - New generation search http://www.experienceon.com Mobile: +34 652 911 201 Skype: carlosgonzalezcadenas LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas On Fri, Feb 17, 2012 at 11:07 AM, Carlos Gonzalez-Cadenas c...@experienceon.com wrote: Thanks Em, Robert, Chris for your time and valuable advice. We'll make some tests and will let you know soon. On Thu, Feb 16, 2012 at 11:43 PM, Em mailformailingli...@yahoo.de wrote: Hello Carlos, I think we missunderstood eachother. As an example: BooleanQuery ( clauses: ( MustMatch( DisjunctionMaxQuery( TermQuery(stopword_field, barcelona), TermQuery(stopword_field, hoteles) ) ), ShouldMatch( FunctionQuery( *please insert your function here* ) ) ) ) Explanation: You construct an artificial BooleanQuery which wraps your user's query as well as your function query. Your user's query - in that case - is just a DisjunctionMaxQuery consisting of two TermQueries. In the real world you might construct another BooleanQuery around your DisjunctionMaxQuery in order to have more flexibility. However the interesting part of the given example is, that we specify the user's query as a MustMatch-condition of the BooleanQuery and the FunctionQuery just as a ShouldMatch. Constructed that way, I am expecting the FunctionQuery only scores those documents which fit the MustMatch-Condition. I conclude that from the fact that the FunctionQuery-class also has a skipTo-method and I would expect that the scorer will use it to score only matching documents (however I did not search where and how it might get called). If my conclusion is wrong than hopefully Robert Muir (as far as I can see the author of that class) can tell us what was the intention by constructing an every-time-match-all-function-query. Can you validate whether your QueryParser constructs a query in the form I drew above? Regards, Em Am 16.02.2012 20:29, schrieb Carlos Gonzalez-Cadenas: Hello Em: 1) Here's a printout of an example DisMax query (as you can see mostly MUST terms except for some SHOULD terms used for boosting scores for stopwords) * * *((+stopword_shortened_phrase:hoteles +stopword_shortened_phrase:barcelona stopword_shortened_phrase:en) | (+stopword_phrase:hoteles +stopword_phrase:barcelona stopword_phrase:en) | (+stopword_shortened_phrase:hoteles +stopword_short ened_phrase:barcelona stopword_shortened_phrase:en) | (+stopword_phrase:hoteles +stopword_phrase:barcelona stopword_phrase:en) | (+stopword_shor tened_phrase:hoteles +wildcard_stopword_shortened_phrase:barcelona stopword_shortened_phrase:en) | (+stopword_phrase:hoteles +wildcard_stopw ord_phrase:barcelona stopword_phrase:en) | (+stopword_shortened_phrase:hoteles +wildcard_stopword_shortened_phrase:barcelona stopword_shortened_phrase:en) | (+stopword_phrase:hoteles +wildcard_stopword_phrase:barcelona stopword_phrase:en))* * * 2)* *The collector is inserted in the SolrIndexSearcher (replacing the TimeLimitingCollector). We trigger it through the SOLR interface by passing the timeAllowed parameter. We know this is a hack but AFAIK there's no out-of-the-box way to specify custom collectors by now (
Re: custom scoring
Carlos, nice to hear that the approach helped you! Could you show us how your query-request looks like after reworking? Regards, Em Am 20.02.2012 13:30, schrieb Carlos Gonzalez-Cadenas: Hello all: We've done some tests with Em's approach of putting a BooleanQuery in front of our user query, that means: BooleanQuery must (DismaxQuery) should (FunctionQuery) The FunctionQuery obtains the SOLR IR score by means of a QueryValueSource, then does the SQRT of this value, and then multiplies it by our custom query_score float, pulling it by means of a FieldCacheSource. In particular, we've proceeded in the following way: - we've loaded the whole index in the page cache of the OS to make sure we don't have disk IO problems that might affect the benchmarks (our machine has enough memory to load all the index in RAM) - we've executed an out-of-benchmark query 10-20 times to make sure that everything is jitted and that Lucene's FieldCache is properly populated. - we've disabled all the caches (filter query cache, document cache, query cache) - we've executed 8 different user queries with and without FunctionQueries, with early termination in both cases (our collector stops after collecting 50 documents per shard) Em was correct, the query is much faster with the BooleanQuery in front, but it's still 30-40% slower than the query without FunctionQueries. Although one may think that it's reasonable that the query response time increases because of the extra computations, we believe that the increase is too big, given that we're collecting just 500-600 documents due to the early query termination techniques we currently use. Any ideas on how to make it faster?. Thanks a lot, Carlos Carlos Gonzalez-Cadenas CEO, ExperienceOn - New generation search http://www.experienceon.com Mobile: +34 652 911 201 Skype: carlosgonzalezcadenas LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas On Fri, Feb 17, 2012 at 11:07 AM, Carlos Gonzalez-Cadenas c...@experienceon.com wrote: Thanks Em, Robert, Chris for your time and valuable advice. We'll make some tests and will let you know soon. On Thu, Feb 16, 2012 at 11:43 PM, Em mailformailingli...@yahoo.de wrote: Hello Carlos, I think we missunderstood eachother. As an example: BooleanQuery ( clauses: ( MustMatch( DisjunctionMaxQuery( TermQuery(stopword_field, barcelona), TermQuery(stopword_field, hoteles) ) ), ShouldMatch( FunctionQuery( *please insert your function here* ) ) ) ) Explanation: You construct an artificial BooleanQuery which wraps your user's query as well as your function query. Your user's query - in that case - is just a DisjunctionMaxQuery consisting of two TermQueries. In the real world you might construct another BooleanQuery around your DisjunctionMaxQuery in order to have more flexibility. However the interesting part of the given example is, that we specify the user's query as a MustMatch-condition of the BooleanQuery and the FunctionQuery just as a ShouldMatch. Constructed that way, I am expecting the FunctionQuery only scores those documents which fit the MustMatch-Condition. I conclude that from the fact that the FunctionQuery-class also has a skipTo-method and I would expect that the scorer will use it to score only matching documents (however I did not search where and how it might get called). If my conclusion is wrong than hopefully Robert Muir (as far as I can see the author of that class) can tell us what was the intention by constructing an every-time-match-all-function-query. Can you validate whether your QueryParser constructs a query in the form I drew above? Regards, Em Am 16.02.2012 20:29, schrieb Carlos Gonzalez-Cadenas: Hello Em: 1) Here's a printout of an example DisMax query (as you can see mostly MUST terms except for some SHOULD terms used for boosting scores for stopwords) * * *((+stopword_shortened_phrase:hoteles +stopword_shortened_phrase:barcelona stopword_shortened_phrase:en) | (+stopword_phrase:hoteles +stopword_phrase:barcelona stopword_phrase:en) | (+stopword_shortened_phrase:hoteles +stopword_short ened_phrase:barcelona stopword_shortened_phrase:en) | (+stopword_phrase:hoteles +stopword_phrase:barcelona stopword_phrase:en) | (+stopword_shor tened_phrase:hoteles +wildcard_stopword_shortened_phrase:barcelona stopword_shortened_phrase:en) | (+stopword_phrase:hoteles +wildcard_stopw ord_phrase:barcelona stopword_phrase:en) | (+stopword_shortened_phrase:hoteles +wildcard_stopword_shortened_phrase:barcelona stopword_shortened_phrase:en) | (+stopword_phrase:hoteles +wildcard_stopword_phrase:barcelona stopword_phrase:en))* * * 2)* *The collector is inserted in the SolrIndexSearcher (replacing the
Re: Development inside or outside of Solr?
Either is possible. For the first, you would write a custom update processor that handled the dual Tika call... For the second, consider writing a SolrJ program that just does it all on the client. Just download Tika from the apache project (or tease out all the jars from the Solr distro) and then make it all work on the client. Here's a sample app: http://www.lucidimagination.com/blog/2012/02/14/indexing-with-solrj/ Best Erick On Sun, Feb 19, 2012 at 9:44 PM, bing jsuser1...@hotmail.com wrote: Hi, all, I am deploying a multicore solr server runing on Tomcat, where I want to achieve language detection during index/query. Solr3.5.0 has a wrapped Tika API that can do language detection. Currently, the default behavior of Solr3.5.0 is, every time I index a document, and at mean time Solr call Tika API to give the result of language detection, i.e. index and detection happens at the same time. However, I hope I can have the language detection result first, and then I decide which core to put the document, i.e. detection happens before index. There seems that I need to do development in either of the following ways: 1. I might need to do revision of Solr itself, change the default behavior of Solr; 2. Or I might write a Java client outside Solr, call the client through server (JSP maybe) in index/query. Can anyone meeting with similar conditions give some suggestions about the advantages and disad of the two approaches? Any other alternatives? Thank you. Best Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Development-inside-or-outside-of-Solr-tp3759680p3759680.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: custom scoring
Yeah Em, it helped a lot :) Here it is (for the user query hoteles): *+(stopword_shortened_phrase:hoteles | stopword_phrase:hoteles | wildcard_stopword_shortened_phrase:hoteles | wildcard_stopword_phrase:hoteles) * *product(pow(query((stopword_shortened_phrase:hoteles | stopword_phrase:hoteles | wildcard_stopword_shortened_phrase:hoteles | wildcard_stopword_phrase:hoteles),def=0.0),const(0.5)),float(query_score))* Thanks a lot for your help. Carlos Carlos Gonzalez-Cadenas CEO, ExperienceOn - New generation search http://www.experienceon.com Mobile: +34 652 911 201 Skype: carlosgonzalezcadenas LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas On Mon, Feb 20, 2012 at 1:50 PM, Em mailformailingli...@yahoo.de wrote: Carlos, nice to hear that the approach helped you! Could you show us how your query-request looks like after reworking? Regards, Em Am 20.02.2012 13:30, schrieb Carlos Gonzalez-Cadenas: Hello all: We've done some tests with Em's approach of putting a BooleanQuery in front of our user query, that means: BooleanQuery must (DismaxQuery) should (FunctionQuery) The FunctionQuery obtains the SOLR IR score by means of a QueryValueSource, then does the SQRT of this value, and then multiplies it by our custom query_score float, pulling it by means of a FieldCacheSource. In particular, we've proceeded in the following way: - we've loaded the whole index in the page cache of the OS to make sure we don't have disk IO problems that might affect the benchmarks (our machine has enough memory to load all the index in RAM) - we've executed an out-of-benchmark query 10-20 times to make sure that everything is jitted and that Lucene's FieldCache is properly populated. - we've disabled all the caches (filter query cache, document cache, query cache) - we've executed 8 different user queries with and without FunctionQueries, with early termination in both cases (our collector stops after collecting 50 documents per shard) Em was correct, the query is much faster with the BooleanQuery in front, but it's still 30-40% slower than the query without FunctionQueries. Although one may think that it's reasonable that the query response time increases because of the extra computations, we believe that the increase is too big, given that we're collecting just 500-600 documents due to the early query termination techniques we currently use. Any ideas on how to make it faster?. Thanks a lot, Carlos Carlos Gonzalez-Cadenas CEO, ExperienceOn - New generation search http://www.experienceon.com Mobile: +34 652 911 201 Skype: carlosgonzalezcadenas LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas On Fri, Feb 17, 2012 at 11:07 AM, Carlos Gonzalez-Cadenas c...@experienceon.com wrote: Thanks Em, Robert, Chris for your time and valuable advice. We'll make some tests and will let you know soon. On Thu, Feb 16, 2012 at 11:43 PM, Em mailformailingli...@yahoo.de wrote: Hello Carlos, I think we missunderstood eachother. As an example: BooleanQuery ( clauses: ( MustMatch( DisjunctionMaxQuery( TermQuery(stopword_field, barcelona), TermQuery(stopword_field, hoteles) ) ), ShouldMatch( FunctionQuery( *please insert your function here* ) ) ) ) Explanation: You construct an artificial BooleanQuery which wraps your user's query as well as your function query. Your user's query - in that case - is just a DisjunctionMaxQuery consisting of two TermQueries. In the real world you might construct another BooleanQuery around your DisjunctionMaxQuery in order to have more flexibility. However the interesting part of the given example is, that we specify the user's query as a MustMatch-condition of the BooleanQuery and the FunctionQuery just as a ShouldMatch. Constructed that way, I am expecting the FunctionQuery only scores those documents which fit the MustMatch-Condition. I conclude that from the fact that the FunctionQuery-class also has a skipTo-method and I would expect that the scorer will use it to score only matching documents (however I did not search where and how it might get called). If my conclusion is wrong than hopefully Robert Muir (as far as I can see the author of that class) can tell us what was the intention by constructing an every-time-match-all-function-query. Can you validate whether your QueryParser constructs a query in the form I drew above? Regards, Em Am 16.02.2012 20:29, schrieb Carlos Gonzalez-Cadenas: Hello Em: 1) Here's a printout of an example DisMax query (as you can see mostly MUST terms except for some SHOULD terms used for boosting scores for stopwords) * *
How to check for inactive cores in a solr multicore setup?
Hello, I am trying to figure out a way to detect inactive cores in a multicore setup. How is that possible? I queried the STATUS of a core through the CoreAdminHandler. Could anyone please tell me what the 'current' field means?? Eg : http://localhost:8080/solr/admin/cores?action=STATUScore=2 Response :: lst name=2str name=name2/strstr name=instanceDirmulticore/solr/2//strstr name=dataDirmulticore/solr/2/data//strdate name=startTime2012-02-17T06:19:20.805Z/datelong name=uptime279811925/longlst name=indexint name=numDocs72373/intint name=maxDoc81487/intlong name=version1328696930153/longint name=segmentCount12/intbool name=currenttrue/boolbool name=hasDeletionstrue/boolstr name=directoryorg.apache.lucene.store.MMapDirectory:org.apache.lucene.store.MMapDirectory@multicore/solr/2/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@4cd0b9d7/strdate name=lastModified2012-02-20T12:02:12Z/date/lst/lst Please help. Thanks, Nasima
RE: customizing standard tokenizer
Thx, will use the custom tokenizer. Its less error prone than the workarounds mentioned. smime.p7s Description: S/MIME cryptographic signature
Re: DataImportHandler running out of memory
DIH still running out of memory for me, with Full Import on a database of size 1.5 GB. Solr version: 3_5_0 Note that I have already added batchSize=-1 but getting same error. Sharing my DIH config below. dataConfig dataSource type=JdbcDataSource name=jdbc driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost:3306/ib user=root password=root batchSize=-1 / document name=content entity name=issue dataSource=jdbc transformer=RegexTransformer,DateFormatTransformer, TemplateTransformer pk=id query= select ib_issue.`_id` as id, ib_issue.`_issue_title` as issueTitle, ib_issue.`_issue_descr` as issueDescr, createdBy.`_name` as issueCreatedByName, createdBy.`_email` as issueCreatedByEmail from `ib_issue` inner join `ib_user` as createdBy on createdBy.`_id` = ib_issue.`_created_by_user_id` group by ib_issue.`_id` /entity /document /dataConfig Please find the error trace below === 2012-02-20 19:04:40.531:INFO::Started SocketConnector@0.0.0.0:8983 Feb 20, 2012 7:04:57 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select params={command=statusqt=/dih_ib_jdbc} status=0 QTime=0 Feb 20, 2012 7:04:58 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select params={command=show-configqt=/dih_ib_jdbc} status=0 QTime=0 Feb 20, 2012 7:05:30 PM org.apache.solr.handler.dataimport.DataImporter doFullImport INFO: Starting Full Import Feb 20, 2012 7:05:30 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/dih_ib_jdbc params={command=full-import} status=0 QTime=0 Feb 20, 2012 7:05:30 PM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties INFO: Read dih_ib_jdbc.properties Feb 20, 2012 7:05:30 PM org.apache.solr.update.DirectUpdateHandler2 deleteAll INFO: [] REMOVING ALL DOCUMENTS FROM INDEX Feb 20, 2012 7:05:30 PM org.apache.solr.core.SolrDeletionPolicy onInit INFO: SolrDeletionPolicy.onInit: commits:num=1 commit{dir=E:\workspace\solr_3_5_0\example\solr\data\index,segFN=segments_1,version=1329744880204,generation=1,filenames=[segments_1] Feb 20, 2012 7:05:30 PM org.apache.solr.core.SolrDeletionPolicy updateCommits INFO: newest commit = 1329744880204 Feb 20, 2012 7:05:30 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Creating a connection for entity issue with URL: jdbc:mysql://localhost:3306/issueburner Feb 20, 2012 7:05:30 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Time taken for getConnection(): 172 Feb 20, 2012 7:07:45 PM org.apache.solr.common.SolrException log SEVERE: Full Import failed:org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.OutOfMemoryError: Java heap space at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:669) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408) Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.util.UnicodeUtil.UTF16toUTF8(UnicodeUtil.java:377) at org.apache.lucene.store.DataOutput.writeString(DataOutput.java:103) at org.apache.lucene.index.FieldsWriter.writeField(FieldsWriter.java:200) at org.apache.lucene.index.StoredFieldsWriterPerThread.addField(StoredFieldsWriterPerThread.java:58) at org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:265) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:766) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2327) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2299) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:240) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115) at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:73) at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:293) at
Re: custom scoring
Could you please provide me the original request (the HTTP-request)? I am a little bit confused to what query_score refers. As far as I can see it isn't a magic-value. Kind regards, Em Am 20.02.2012 14:05, schrieb Carlos Gonzalez-Cadenas: Yeah Em, it helped a lot :) Here it is (for the user query hoteles): *+(stopword_shortened_phrase:hoteles | stopword_phrase:hoteles | wildcard_stopword_shortened_phrase:hoteles | wildcard_stopword_phrase:hoteles) * *product(pow(query((stopword_shortened_phrase:hoteles | stopword_phrase:hoteles | wildcard_stopword_shortened_phrase:hoteles | wildcard_stopword_phrase:hoteles),def=0.0),const(0.5)),float(query_score))* Thanks a lot for your help. Carlos Carlos Gonzalez-Cadenas CEO, ExperienceOn - New generation search http://www.experienceon.com Mobile: +34 652 911 201 Skype: carlosgonzalezcadenas LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas On Mon, Feb 20, 2012 at 1:50 PM, Em mailformailingli...@yahoo.de wrote: Carlos, nice to hear that the approach helped you! Could you show us how your query-request looks like after reworking? Regards, Em Am 20.02.2012 13:30, schrieb Carlos Gonzalez-Cadenas: Hello all: We've done some tests with Em's approach of putting a BooleanQuery in front of our user query, that means: BooleanQuery must (DismaxQuery) should (FunctionQuery) The FunctionQuery obtains the SOLR IR score by means of a QueryValueSource, then does the SQRT of this value, and then multiplies it by our custom query_score float, pulling it by means of a FieldCacheSource. In particular, we've proceeded in the following way: - we've loaded the whole index in the page cache of the OS to make sure we don't have disk IO problems that might affect the benchmarks (our machine has enough memory to load all the index in RAM) - we've executed an out-of-benchmark query 10-20 times to make sure that everything is jitted and that Lucene's FieldCache is properly populated. - we've disabled all the caches (filter query cache, document cache, query cache) - we've executed 8 different user queries with and without FunctionQueries, with early termination in both cases (our collector stops after collecting 50 documents per shard) Em was correct, the query is much faster with the BooleanQuery in front, but it's still 30-40% slower than the query without FunctionQueries. Although one may think that it's reasonable that the query response time increases because of the extra computations, we believe that the increase is too big, given that we're collecting just 500-600 documents due to the early query termination techniques we currently use. Any ideas on how to make it faster?. Thanks a lot, Carlos Carlos Gonzalez-Cadenas CEO, ExperienceOn - New generation search http://www.experienceon.com Mobile: +34 652 911 201 Skype: carlosgonzalezcadenas LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas On Fri, Feb 17, 2012 at 11:07 AM, Carlos Gonzalez-Cadenas c...@experienceon.com wrote: Thanks Em, Robert, Chris for your time and valuable advice. We'll make some tests and will let you know soon. On Thu, Feb 16, 2012 at 11:43 PM, Em mailformailingli...@yahoo.de wrote: Hello Carlos, I think we missunderstood eachother. As an example: BooleanQuery ( clauses: ( MustMatch( DisjunctionMaxQuery( TermQuery(stopword_field, barcelona), TermQuery(stopword_field, hoteles) ) ), ShouldMatch( FunctionQuery( *please insert your function here* ) ) ) ) Explanation: You construct an artificial BooleanQuery which wraps your user's query as well as your function query. Your user's query - in that case - is just a DisjunctionMaxQuery consisting of two TermQueries. In the real world you might construct another BooleanQuery around your DisjunctionMaxQuery in order to have more flexibility. However the interesting part of the given example is, that we specify the user's query as a MustMatch-condition of the BooleanQuery and the FunctionQuery just as a ShouldMatch. Constructed that way, I am expecting the FunctionQuery only scores those documents which fit the MustMatch-Condition. I conclude that from the fact that the FunctionQuery-class also has a skipTo-method and I would expect that the scorer will use it to score only matching documents (however I did not search where and how it might get called). If my conclusion is wrong than hopefully Robert Muir (as far as I can see the author of that class) can tell us what was the intention by constructing an every-time-match-all-function-query. Can you validate whether your QueryParser constructs a query in the form I drew above? Regards, Em Am 16.02.2012 20:29, schrieb Carlos Gonzalez-Cadenas: Hello Em: 1) Here's a printout of an
Re: custom scoring
Hi Em: The HTTP request is not gonna help you a lot because we use a custom QParser (that builds the query that I've pasted before). In any case, here it is: http://localhost:8080/solr/core0/select?shards=…(shards here)…indent=onwt=exontimeAllowed=50fl=resulting_phrase%2Cquery_id%2Ctype%2Chighlightingstart=0rows=16limit=20q=%7B!exonautocomplete%7Dhoteleshttp://localhost:8080/solr/core0/select?shards=exp302%3A8983%2Fsolr%2Fcore0%2Cexp302%3A8983%2Fsolr%2Fcore1%2Cexp302%3A8983%2Fsolr%2Fcore2%2Cexp302%3A8983%2Fsolr%2Fcore3%2Cexp302%3A8983%2Fsolr%2Fcore4%2Cexp302%3A8983%2Fsolr%2Fcore5%2Cexp302%3A8983%2Fsolr%2Fcore6%2Cexp302%3A8983%2Fsolr%2Fcore7%2Cexp302%3A8983%2Fsolr%2Fcore8%2Cexp302%3A8983%2Fsolr%2Fcore9%2Cexp302%3A8983%2Fsolr%2Fcore10%2Cexp302%3A8983%2Fsolr%2Fcore11sort=score%20desc%2C%20query_score%20descindent=onwt=exontimeAllowed=50fl=resulting_phrase%2Cquery_id%2Ctype%2Chighlightingstart=0vrows=4rows=16limit=20q=%7B!exonautocomplete%7DBARCELONAgyvl7cn3 We're implementing a query autocomplete system, therefore our Lucene documents are queries. query_score is a field that is indexed and stored with every document. It expresses how popular a given query is (i.e. common queries like hotels in barcelona have a bigger query_score than less common queries like hotels in barcelona near the beach). Let me know if you need something else. Thanks, Carlos Carlos Gonzalez-Cadenas CEO, ExperienceOn - New generation search http://www.experienceon.com Mobile: +34 652 911 201 Skype: carlosgonzalezcadenas LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas On Mon, Feb 20, 2012 at 3:12 PM, Em mailformailingli...@yahoo.de wrote: Could you please provide me the original request (the HTTP-request)? I am a little bit confused to what query_score refers. As far as I can see it isn't a magic-value. Kind regards, Em Am 20.02.2012 14:05, schrieb Carlos Gonzalez-Cadenas: Yeah Em, it helped a lot :) Here it is (for the user query hoteles): *+(stopword_shortened_phrase:hoteles | stopword_phrase:hoteles | wildcard_stopword_shortened_phrase:hoteles | wildcard_stopword_phrase:hoteles) * *product(pow(query((stopword_shortened_phrase:hoteles | stopword_phrase:hoteles | wildcard_stopword_shortened_phrase:hoteles | wildcard_stopword_phrase:hoteles),def=0.0),const(0.5)),float(query_score))* Thanks a lot for your help. Carlos Carlos Gonzalez-Cadenas CEO, ExperienceOn - New generation search http://www.experienceon.com Mobile: +34 652 911 201 Skype: carlosgonzalezcadenas LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas On Mon, Feb 20, 2012 at 1:50 PM, Em mailformailingli...@yahoo.de wrote: Carlos, nice to hear that the approach helped you! Could you show us how your query-request looks like after reworking? Regards, Em Am 20.02.2012 13:30, schrieb Carlos Gonzalez-Cadenas: Hello all: We've done some tests with Em's approach of putting a BooleanQuery in front of our user query, that means: BooleanQuery must (DismaxQuery) should (FunctionQuery) The FunctionQuery obtains the SOLR IR score by means of a QueryValueSource, then does the SQRT of this value, and then multiplies it by our custom query_score float, pulling it by means of a FieldCacheSource. In particular, we've proceeded in the following way: - we've loaded the whole index in the page cache of the OS to make sure we don't have disk IO problems that might affect the benchmarks (our machine has enough memory to load all the index in RAM) - we've executed an out-of-benchmark query 10-20 times to make sure that everything is jitted and that Lucene's FieldCache is properly populated. - we've disabled all the caches (filter query cache, document cache, query cache) - we've executed 8 different user queries with and without FunctionQueries, with early termination in both cases (our collector stops after collecting 50 documents per shard) Em was correct, the query is much faster with the BooleanQuery in front, but it's still 30-40% slower than the query without FunctionQueries. Although one may think that it's reasonable that the query response time increases because of the extra computations, we believe that the increase is too big, given that we're collecting just 500-600 documents due to the early query termination techniques we currently use. Any ideas on how to make it faster?. Thanks a lot, Carlos Carlos Gonzalez-Cadenas CEO, ExperienceOn - New generation search http://www.experienceon.com Mobile: +34 652 911 201 Skype: carlosgonzalezcadenas LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas On Fri, Feb 17, 2012 at 11:07 AM, Carlos Gonzalez-Cadenas c...@experienceon.com wrote: Thanks Em, Robert, Chris for your time and valuable advice. We'll make some tests and will let you know soon. On Thu, Feb 16,
postCommit confusion?
in a solr master slave replication, if I register postCommit listener on a slave, which index reader should I get if I do: @Override public final void postCommit() { final RefCountedSolrIndexSearcher refC = core .getNewestSearcher(true); try{ final MapString, String userData = refC.get().getIndexReader().getIndexCommit().getUserData(); // do something with userData } catch (IOException e) { log.error(PostCommit: , e); } finally { refC.decref(); } } What I observed is that I get stale userData, is this correct? Didn't commit replace IndexReader to the actual commit point? (I observe userData that were there before replication finished, but I expected to see userData version from the master at this stage) If I force core.openNewSearcher(false, false); I get correct, replicated userData I just received from master… What I am doing wrong? Contract of core.getNewestSearcher(true) return in postCommit(), or better when solr updates commit point? Not so import an for the particular problem, but interesting to know these life cycles. Thanks, eks
Is Sphinx better suited to me, or should I look at Solr?
I am creating what is effectively a search engine. Content is collected via spiders at then is inserted into my database and becomes searchable and filterable. I invision there being around 90K records to be searched at any one time. The content is blog posts and forum posts so we are basically looking at full text with some additional filters based on location, category and date posted. What is really important to me is speed and relevancy. The index size or index time really isn’t too big of an issue. From the benchmarks I have seen it looks like Sphinx is much faster at querying data and showing results, but that Solr has improved relevancy. My website is coded entirely in PHP and I am planning on using a MYSQL database. Can anyone please give me a bit of input and help me decide which product might be better suited to me. Regards, James -- View this message in context: http://lucene.472066.n3.nabble.com/Is-Sphinx-better-suited-to-me-or-should-I-look-at-Solr-tp3760988p3760988.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: custom scoring
Hi Carlos, query_score is a field that is indexed and stored with every document. Thanks for clarifying that, now the whole query-string makes more sense to me. Did you check whether query() - without product() and pow() - is also much slower than a normal query? I guess, if the performance-decrease without product() and pow() is not that large, you are hitting the small overhead that comes with every function query. It would be nice, if you could check that. However, let's take a step back and look what you really want to achieve instead of how you are trying to achieve it right now. You want to influence the score of your actual query by a value that represents a combination of some static values and the likelyness of how good a query matches a document. From your query, I can see that you are using the same fields in your FunctionQuery and within your MainQuery (let's call the q-param MainQuery). This means that the scores of your query()-method and your MainQuery should be identical. Let's call this value just score and rename your field query_score popularity. I don't know how you are implementing the FunctionQuery (boost by multiplication, boost by addition), but it seems clear to me that your formula looks this way: score x (score^0.5*popularity) where x is kind of an operator (+,*,...) Why don't you reduce it to score * boost(log(popularity)). This is a trade-off between precision and performance. You could even improve the above by setting the doc's boost equal to log(populary) at indexing time. What do you think about that? Regards, Em Am 20.02.2012 15:37, schrieb Carlos Gonzalez-Cadenas: Hi Em: The HTTP request is not gonna help you a lot because we use a custom QParser (that builds the query that I've pasted before). In any case, here it is: http://localhost:8080/solr/core0/select?shards=…(shards here)…indent=onwt=exontimeAllowed=50fl=resulting_phrase%2Cquery_id%2Ctype%2Chighlightingstart=0rows=16limit=20q=%7B!exonautocomplete%7Dhoteleshttp://localhost:8080/solr/core0/select?shards=exp302%3A8983%2Fsolr%2Fcore0%2Cexp302%3A8983%2Fsolr%2Fcore1%2Cexp302%3A8983%2Fsolr%2Fcore2%2Cexp302%3A8983%2Fsolr%2Fcore3%2Cexp302%3A8983%2Fsolr%2Fcore4%2Cexp302%3A8983%2Fsolr%2Fcore5%2Cexp302%3A8983%2Fsolr%2Fcore6%2Cexp302%3A8983%2Fsolr%2Fcore7%2Cexp302%3A8983%2Fsolr%2Fcore8%2Cexp302%3A8983%2Fsolr%2Fcore9%2Cexp302%3A8983%2Fsolr%2Fcore10%2Cexp302%3A8983%2Fsolr%2Fcore11sort=score%20desc%2C%20query_score%20descindent=onwt=exontimeAllowed=50fl=resulting_phrase%2Cquery_id%2Ctype%2Chighlightingstart=0vrows=4rows=16limit=20q=%7B!exonautocomplete%7DBARCELONAgyvl7cn3 We're implementing a query autocomplete system, therefore our Lucene documents are queries. query_score is a field that is indexed and stored with every document. It expresses how popular a given query is (i.e. common queries like hotels in barcelona have a bigger query_score than less common queries like hotels in barcelona near the beach). Let me know if you need something else. Thanks, Carlos Carlos Gonzalez-Cadenas CEO, ExperienceOn - New generation search http://www.experienceon.com Mobile: +34 652 911 201 Skype: carlosgonzalezcadenas LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas On Mon, Feb 20, 2012 at 3:12 PM, Em mailformailingli...@yahoo.de wrote: Could you please provide me the original request (the HTTP-request)? I am a little bit confused to what query_score refers. As far as I can see it isn't a magic-value. Kind regards, Em Am 20.02.2012 14:05, schrieb Carlos Gonzalez-Cadenas: Yeah Em, it helped a lot :) Here it is (for the user query hoteles): *+(stopword_shortened_phrase:hoteles | stopword_phrase:hoteles | wildcard_stopword_shortened_phrase:hoteles | wildcard_stopword_phrase:hoteles) * *product(pow(query((stopword_shortened_phrase:hoteles | stopword_phrase:hoteles | wildcard_stopword_shortened_phrase:hoteles | wildcard_stopword_phrase:hoteles),def=0.0),const(0.5)),float(query_score))* Thanks a lot for your help. Carlos Carlos Gonzalez-Cadenas CEO, ExperienceOn - New generation search http://www.experienceon.com Mobile: +34 652 911 201 Skype: carlosgonzalezcadenas LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas On Mon, Feb 20, 2012 at 1:50 PM, Em mailformailingli...@yahoo.de wrote: Carlos, nice to hear that the approach helped you! Could you show us how your query-request looks like after reworking? Regards, Em Am 20.02.2012 13:30, schrieb Carlos Gonzalez-Cadenas: Hello all: We've done some tests with Em's approach of putting a BooleanQuery in front of our user query, that means: BooleanQuery must (DismaxQuery) should (FunctionQuery) The FunctionQuery obtains the SOLR IR score by means of a QueryValueSource, then does the SQRT of this value, and then multiplies it by our custom query_score float, pulling it by means of a FieldCacheSource. In particular, we've proceeded in the following
How to index a facetfield by searching words matching from another Textfield
Hi everyone, I'm a new Solr User but i used to work on Endeca. There is a modul called TextTagger with Endeca that is auto indexing values in a facetfield (multivalued) when he find words (from a given wordslist) into an other TextField from that document. I didn't see any subjects or any ways to do it with Solr ??? Thanks for advance ;) -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-index-a-facetfield-by-searching-words-matching-from-another-Textfield-tp3761201p3761201.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is Sphinx better suited to me, or should I look at Solr?
Hi James, I can not speak for Sphinx, since I never used it. However, from reading your requirements there is nothing that fears Solr. Although Sphinx is written in C++, running Solr on top of a HotSpot JVM gives you high performance. Furthermore the HotSpot JVM is optimizing your code at runtime which sometimes allows long-running applications to run as fast as software written in C++ (and sometimes even faster). Given that Solr is pretty fast and scalable (90k docs are a really small index), you should have a closer look at the features each search-server provides to you and how they suit your needs. You should always keep in mind that users will gladly wait a few milliseconds longer for their highly-relevant search-results, but do not care about a blazing fast 5ms response-time for a collection of trash-results. So try to find out what your concrete needs in terms of relevancy are and which search-server provides you the tools to go. I am pretty sure that both projects provide you php-client-libraries etc. for indexing and searching (Solr does). Kind regards, Em Am 20.02.2012 16:20, schrieb Spadez: I am creating what is effectively a search engine. Content is collected via spiders at then is inserted into my database and becomes searchable and filterable. I invision there being around 90K records to be searched at any one time. The content is blog posts and forum posts so we are basically looking at full text with some additional filters based on location, category and date posted. What is really important to me is speed and relevancy. The index size or index time really isn’t too big of an issue. From the benchmarks I have seen it looks like Sphinx is much faster at querying data and showing results, but that Solr has improved relevancy. My website is coded entirely in PHP and I am planning on using a MYSQL database. Can anyone please give me a bit of input and help me decide which product might be better suited to me. Regards, James -- View this message in context: http://lucene.472066.n3.nabble.com/Is-Sphinx-better-suited-to-me-or-should-I-look-at-Solr-tp3760988p3760988.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to index a facetfield by searching words matching from another Textfield
Hi Xavier, sounds like a job for KeepWordFilter! From the javadocs: A TokenFilter that only keeps tokens with text contained in the required words. This filter behaves like the inverse of StopFilter. However, you have to provide the wordslist as a .txt-file. By using copyFields and the KeepWordFilter you are able to achieve what you want. Kind regards, Em Am 20.02.2012 17:28, schrieb Xavier: Hi everyone, I'm a new Solr User but i used to work on Endeca. There is a modul called TextTagger with Endeca that is auto indexing values in a facetfield (multivalued) when he find words (from a given wordslist) into an other TextField from that document. I didn't see any subjects or any ways to do it with Solr ??? Thanks for advance ;) -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-index-a-facetfield-by-searching-words-matching-from-another-Textfield-tp3761201p3761201.html Sent from the Solr - User mailing list archive at Nabble.com.
lucene operators interfearing in edismax
Hi, I am using edismax with end user entered strings. One search was not finding what appeared to be the best match. The search was: Sage Creek Organics - Enchanted If I remove the -, the doc I want is found as best score. Turns out (I think) the - is the culprit as the best match has 'enchanted' and this makes it 'NOT enchanted' Is my analisys correct? I tried looking at the debug output but saw not NOT entries there... If so, is there a standard way (any filter) to remove lucene operators from user entered queries? I thought this must be something usual. thanks javi -- View this message in context: http://lucene.472066.n3.nabble.com/lucene-operators-interfearing-in-edismax-tp3761577p3761577.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: lucene operators interfearing in edismax
This should be fixed in trunk by LUCENE-2566 QueryParser: Unary operators +,-,! will not be treated as operators if they are followed by whitespace. -Yonik lucidimagination.com On Mon, Feb 20, 2012 at 2:09 PM, jmlucjav jmluc...@gmail.com wrote: Hi, I am using edismax with end user entered strings. One search was not finding what appeared to be the best match. The search was: Sage Creek Organics - Enchanted If I remove the -, the doc I want is found as best score. Turns out (I think) the - is the culprit as the best match has 'enchanted' and this makes it 'NOT enchanted' Is my analisys correct? I tried looking at the debug output but saw not NOT entries there... If so, is there a standard way (any filter) to remove lucene operators from user entered queries? I thought this must be something usual. thanks javi -- View this message in context: http://lucene.472066.n3.nabble.com/lucene-operators-interfearing-in-edismax-tp3761577p3761577.html Sent from the Solr - User mailing list archive at Nabble.com.
Exception importing multi-valued UUID field
Hi, I exported a csv file from SOLR and made some changes, I then tried to reimport the file and got the exception below. It seems UUID field type can't import multi-values, I removed all of the multi-values and it imported without an issue. Cheers org.apache.solr.common.SolrException: Error while creating field 'jobuid{type=uuid,properties=indexed,stored,omitTermFreqAndPositions,multiValued}' from value '845b9db2-2a25-44e3-8eb4-3bf17cd16738,c5477d5d-e77c-45e9-ab61-f7ca05499b37' at org.apache.solr.schema.FieldType.createField(FieldType.java:239) at org.apache.solr.schema.SchemaField.createField(SchemaField.java:104) at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:203) at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:276) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115) at org.apache.solr.handler.CSVLoader.doAdd(CSVRequestHandler.java:416) at org.apache.solr.handler.SingleThreadedCSVLoader.addDoc(CSVRequestHandler.java:431) at org.apache.solr.handler.CSVLoader.load(CSVRequestHandler.java:393) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:67) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:300) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:679) Caused by: org.apache.solr.common.SolrException: Invalid UUID String: '845b9db2-2a25-44e3-8eb4-3bf17cd16738,c5477d5d-e77c-45e9-ab61-f7ca05499b37' at org.apache.solr.schema.UUIDField.toInternal(UUIDField.java:85) at org.apache.solr.schema.FieldType.createField(FieldType.java:237)
Re: Exception importing multi-valued UUID field
I also tried it with the comma escaped, so: '845b9db2-2a25-44e3-8eb4-3bf17cd16738\,c5477d5d-e77c-45e9-ab61-f7ca05499b37' So that's in the same format as it was exported, Excel must have removed the slash. But I still get the error with the slash. On Tue, Feb 21, 2012 at 11:26 AM, Greg Pelly gfpe...@gmail.com wrote: Hi, I exported a csv file from SOLR and made some changes, I then tried to reimport the file and got the exception below. It seems UUID field type can't import multi-values, I removed all of the multi-values and it imported without an issue. Cheers org.apache.solr.common.SolrException: Error while creating field 'jobuid{type=uuid,properties=indexed,stored,omitTermFreqAndPositions,multiValued}' from value '845b9db2-2a25-44e3-8eb4-3bf17cd16738,c5477d5d-e77c-45e9-ab61-f7ca05499b37' at org.apache.solr.schema.FieldType.createField(FieldType.java:239) at org.apache.solr.schema.SchemaField.createField(SchemaField.java:104) at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:203) at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:276) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115) at org.apache.solr.handler.CSVLoader.doAdd(CSVRequestHandler.java:416) at org.apache.solr.handler.SingleThreadedCSVLoader.addDoc(CSVRequestHandler.java:431) at org.apache.solr.handler.CSVLoader.load(CSVRequestHandler.java:393) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:67) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:300) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:679) Caused by: org.apache.solr.common.SolrException: Invalid UUID String: '845b9db2-2a25-44e3-8eb4-3bf17cd16738,c5477d5d-e77c-45e9-ab61-f7ca05499b37' at org.apache.solr.schema.UUIDField.toInternal(UUIDField.java:85) at org.apache.solr.schema.FieldType.createField(FieldType.java:237)
Re: Is Sphinx better suited to me, or should I look at Solr?
I gave up on sphinx and went to solr. I feel it is more mature. For example, sphinx didn't have an auto start init script and they tried to hit me up for consultancy fees cos I asked a simple question. I use php and use solarium php client. Nice oop interface. Solr has a great community. My initial struggles were with getting it running, mostly because I don't know much about tomcat and it didn't just work for me as documented, but once i stumbled through it was ok. My search results accross 200k documents is instant on a small 512mb rackspacecloud instance so you will have no probs at all using solr for your needs. Sent from my iPhone On 21/02/2012, at 3:32 AM, Em mailformailingli...@yahoo.de wrote: Hi James, I can not speak for Sphinx, since I never used it. However, from reading your requirements there is nothing that fears Solr. Although Sphinx is written in C++, running Solr on top of a HotSpot JVM gives you high performance. Furthermore the HotSpot JVM is optimizing your code at runtime which sometimes allows long-running applications to run as fast as software written in C++ (and sometimes even faster). Given that Solr is pretty fast and scalable (90k docs are a really small index), you should have a closer look at the features each search-server provides to you and how they suit your needs. You should always keep in mind that users will gladly wait a few milliseconds longer for their highly-relevant search-results, but do not care about a blazing fast 5ms response-time for a collection of trash-results. So try to find out what your concrete needs in terms of relevancy are and which search-server provides you the tools to go. I am pretty sure that both projects provide you php-client-libraries etc. for indexing and searching (Solr does). Kind regards, Em Am 20.02.2012 16:20, schrieb Spadez: I am creating what is effectively a search engine. Content is collected via spiders at then is inserted into my database and becomes searchable and filterable. I invision there being around 90K records to be searched at any one time. The content is blog posts and forum posts so we are basically looking at full text with some additional filters based on location, category and date posted. What is really important to me is speed and relevancy. The index size or index time really isn’t too big of an issue. From the benchmarks I have seen it looks like Sphinx is much faster at querying data and showing results, but that Solr has improved relevancy. My website is coded entirely in PHP and I am planning on using a MYSQL database. Can anyone please give me a bit of input and help me decide which product might be better suited to me. Regards, James -- View this message in context: http://lucene.472066.n3.nabble.com/Is-Sphinx-better-suited-to-me-or-should-I-look-at-Solr-tp3760988p3760988.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Exception importing multi-valued UUID field
On Mon, Feb 20, 2012 at 7:26 PM, Greg Pelly gfpe...@gmail.com wrote: I exported a csv file from SOLR and made some changes, I then tried to reimport the file and got the exception below. It seems UUID field type can't import multi-values, I removed all of the multi-values and it imported without an issue. Did you try split=true? http://wiki.apache.org/solr/UpdateCSV#split -Yonik lucidimagination.com
Re: Exception importing multi-valued UUID field
I don't think escaping is your problem, you probably want to take that bit out. Try adding f.youruuidfieldname.split=true when importing. You might also have to specify something like f.houruuidfieldname.separator=, but probably not, I suspect it's the default. See the split heading at: http://wiki.apache.org/solr/UpdateCSV Although I have to ask about your use case for curiosity, is this some kind of 1-n mapping to other docs? Best Erick On Mon, Feb 20, 2012 at 7:43 PM, Greg Pelly gfpe...@gmail.com wrote: I also tried it with the comma escaped, so: '845b9db2-2a25-44e3-8eb4-3bf17cd16738\,c5477d5d-e77c-45e9-ab61-f7ca05499b37' So that's in the same format as it was exported, Excel must have removed the slash. But I still get the error with the slash. On Tue, Feb 21, 2012 at 11:26 AM, Greg Pelly gfpe...@gmail.com wrote: Hi, I exported a csv file from SOLR and made some changes, I then tried to reimport the file and got the exception below. It seems UUID field type can't import multi-values, I removed all of the multi-values and it imported without an issue. Cheers org.apache.solr.common.SolrException: Error while creating field 'jobuid{type=uuid,properties=indexed,stored,omitTermFreqAndPositions,multiValued}' from value '845b9db2-2a25-44e3-8eb4-3bf17cd16738,c5477d5d-e77c-45e9-ab61-f7ca05499b37' at org.apache.solr.schema.FieldType.createField(FieldType.java:239) at org.apache.solr.schema.SchemaField.createField(SchemaField.java:104) at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:203) at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:276) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115) at org.apache.solr.handler.CSVLoader.doAdd(CSVRequestHandler.java:416) at org.apache.solr.handler.SingleThreadedCSVLoader.addDoc(CSVRequestHandler.java:431) at org.apache.solr.handler.CSVLoader.load(CSVRequestHandler.java:393) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:67) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:300) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:679) Caused by: org.apache.solr.common.SolrException: Invalid UUID String: '845b9db2-2a25-44e3-8eb4-3bf17cd16738,c5477d5d-e77c-45e9-ab61-f7ca05499b37' at org.apache.solr.schema.UUIDField.toInternal(UUIDField.java:85) at org.apache.solr.schema.FieldType.createField(FieldType.java:237)