Re: Field names with a period (.)
On Thu, May 5, 2011 at 5:08 AM, Leonardo Souza leonardo...@gmail.com wrote: Hi guys, Can i have a field name with a period(.) ? Like in *file.size* Cannot find now where this is documented, but from what I remember it is recommended to use only characters A-Z, a-z, 0-9, and underscore (_) in field names, and some special characters are known to cause problems. Regards, Gora
copyField
another question if i define different fields with different boosts and then copy them into another field and make a search by using this universal field, the boosting will be done? -- View this message in context: http://lucene.472066.n3.nabble.com/copyField-tp2902242p2902242.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is it possible to use sub-fields or multivalued fields for boosting?
Hello deniz, You could create a new field say FullName which is a copyfield of firstname and surname. Search on both the new field and location but boost up the new field query. Regards Aditya www.findbestopensource.com On Thu, May 5, 2011 at 9:21 AM, deniz denizdurmu...@gmail.com wrote: okay... let me make the situation more clear... I am trying to create an universal field which includes information about users like firstname, surname, gender, location etc. When I enter something e.g London, I would like to match any users having 'London' in any field firstname, surname or location. But if it matches name or surname, I would like to give a higher weight. so my question is... is it possible to have sub-fields? like field name=universal field name=firstnameblabla/field field name=surnameblabla/field field name=genderblabla/field field name=locationblabla/field /field or any other ideas for implementing such feature? -- View this message in context: http://lucene.472066.n3.nabble.com/Is-it-possible-to-use-sub-fields-or-multivalued-fields-for-boosting-tp2901992p2901992.html Sent from the Solr - User mailing list archive at Nabble.com.
How much does Solr enterprise server differ from the non Enterprise server?
I am asking specifically because I am wondering if it is worth my time too read the Enterprise server book or if there is too much of a branch between the two? If I read the book are there any parts of the book specifically that won't be relevant? Thanks, Bryan Rasmussen
Re: Patch problems solr 1.4 - solr-2010
Hello, thanks for the answers, i use branch 1.4 and i have succesfully patch solr-2010. Now i want to use the collate spellchecking. How does my url look like. I tried this but it's not working(It's the same as solr without solr-2010). http://localhost:8983/solr/select?q=man unitetspellcheck.q=man unitetspellcheck=truespellcheck.build=truespellcheck.collate=truespellcheck.collateExtendedResult=truespellcheck.maxCollations=10spellcheck.maxCollationTries=10 I get the collapse man united as suggestion. Man is good spelled, but not in this phrase. It must be manchester united and i want that solr requerying the collapse and only give the suggestion if it gives some results. How can i fix this?? -- View this message in context: http://lucene.472066.n3.nabble.com/Patch-problems-solr-1-4-solr-2010-tp2898443p2902546.html Sent from the Solr - User mailing list archive at Nabble.com.
Does the Solr enable Lemmatization [not the Stemming]
Does the solr enable lemmatization concept? I found a documentation that gives an information as solr enables lemmatization concept. Here is the link : http://www.basistech.com/knowledge-center/search/2010-09-language-identification-language-support-and-entity-extraction.pdf Can anyone help me finding the jar specified in that document so that i can add it as plugin. jar :rlp.solr.RLPTokenizerFactory Thanks and Regards, Rajani Maski
Re: JsonUpdateRequestHandler
Justine, The JSON update request handler was added in Solr 3.1. Please download this version and try again. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 3. mai 2011, at 22.34, Justine Mathews wrote: Hi, When I have add the Json request handler as below for update in solrconfig.xml requestHandler name=/update/json class=solr.JsonUpdateRequestHandler/ I am getting following error. Version : apache-solr-1.4.1. Could you please help... Error is shown below, Check your log files for more detailed information on what may be wrong. If you want solr to continue after configuration errors, change: abortOnConfigurationErrorfalse/abortOnConfigurationError in solrconfig.xml - org.apache.solr.common.SolrException: Error loading class 'solr.JsonUpdateRequestHandler' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413) at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:449) at org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:152) at org.apache.solr.core.SolrCore.init(SolrCore.java:556) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594) at org.mortbay.jetty.servlet.Context.startContext(Context.java:139) at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218) at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500) at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147) at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117) at org.mortbay.jetty.Server.doStart(Server.java:210) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.mortbay.start.Main.invokeMain(Main.java:183) at org.mortbay.start.Main.start(Main.java:497) at org.mortbay.start.Main.main(Main.java:115) Caused by: java.lang.ClassNotFoundException: solr.JsonUpdateRequestHandler at java.net.URLClassLoader$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at java.net.FactoryURLClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Unknown Source) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:359) ... 30 more RequestURI=/solr/ -- Regards, Justine K Mathews, MCSD.NET Mob: +44-(0) 7795268546 http://www.justinemathews.comhttp://www.justinemathews.com/ http://uk.linkedin.com/in/justinemathews
Re: copyField
if i define different fields with different boosts and then copy them into another field and make a search by using this universal field, the boosting will be done? No. copyField just copies raw content.
Re: How much does Solr enterprise server differ from the non Enterprise server?
Hi, Solr IS an enterprise search server. And there is only one edition :) I'd wait a few more weeks until the Solr 3.1 books are available, and then read up on it. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 5. mai 2011, at 09.37, bryan rasmussen wrote: I am asking specifically because I am wondering if it is worth my time too read the Enterprise server book or if there is too much of a branch between the two? If I read the book are there any parts of the book specifically that won't be relevant? Thanks, Bryan Rasmussen
Re: Does the Solr enable Lemmatization [not the Stemming]
Hi, Solr does not have lemmatization out of the box. You'll have to find 3rd party analyzers, and the most known such is from BasisTech. Please contact them to learn more. I'm not aware of any open source lemmatizers for Solr. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 5. mai 2011, at 10.34, rajini maski wrote: Does the solr enable lemmatization concept? I found a documentation that gives an information as solr enables lemmatization concept. Here is the link : http://www.basistech.com/knowledge-center/search/2010-09-language-identification-language-support-and-entity-extraction.pdf Can anyone help me finding the jar specified in that document so that i can add it as plugin. jar :rlp.solr.RLPTokenizerFactory Thanks and Regards, Rajani Maski
Re: How much does Solr enterprise server differ from the non Enterprise server?
ok, I just saw the thing about syncing the version numbers. Is there any information on these Solr 3.1 books? Publishers, publication dates, website on them? Mvh, Bryan Rasmussen On Thu, May 5, 2011 at 10:57 AM, Jan Høydahl jan@cominvent.com wrote: Hi, Solr IS an enterprise search server. And there is only one edition :) I'd wait a few more weeks until the Solr 3.1 books are available, and then read up on it. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 5. mai 2011, at 09.37, bryan rasmussen wrote: I am asking specifically because I am wondering if it is worth my time too read the Enterprise server book or if there is too much of a branch between the two? If I read the book are there any parts of the book specifically that won't be relevant? Thanks, Bryan Rasmussen
Why is org.apache.solr.response.XMLWriter final?
Hello, It's final in the trunk, and has always been since conception in 2006 at revision 372455. Why? -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Format date before indexing it
Hi, I have to index records that have fields containing date. This date can be : 2011, 2011-05, 2015-05-01. Trailing characters also can be slashes. I'd like to convert theses values into a valid date for Solr. So my question is : what is the best way to achieve this? 1) Use solr.DateField and make my own filter to that I get the date in the right format 2) Subclass solr.DateField ? Thanks in advance, Marc.
Is it possible to load all indexed data in search request
Hi I can load all indexed data using /select request and query param as *:*. I tried same with /Search request but it didn't work. Even it didn't work for * as query value. I am using disMax handler. Is it possible to load all indexed data in search and suggest request? -- View this message in context: http://lucene.472066.n3.nabble.com/Is-it-possible-to-load-all-indexed-data-in-search-request-tp2902808p2902808.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is it possible to load all indexed data in search request
On Thu, May 5, 2011 at 3:48 PM, Kannan ramkannan2...@gmail.com wrote: Hi I can load all indexed data using /select request and query param as *:*. I tried same with /Search request but it didn't work. Even it didn't work for * as query value. I am using disMax handler. Is it possible to load all indexed data in search and suggest request? If I understand correctly, you are trying to retrieve all Solr records in one go: Question 3.8 in the FAQ ( http://wiki.apache.org/solr/FAQ ) addresses this. Regards, Gora
Re: Is it possible to load all indexed data in search request
I am using disMax handler. Is it possible to load all indexed data in search and suggest request? With dismax, you can use q.alt=*:* parameter. Don't use q parameter at all.
Re: Format date before indexing it
--- On Thu, 5/5/11, Marc SCHNEIDER marc.schneide...@gmail.com wrote: From: Marc SCHNEIDER marc.schneide...@gmail.com Subject: Format date before indexing it To: solr-user solr-user@lucene.apache.org Date: Thursday, May 5, 2011, 12:51 PM Hi, I have to index records that have fields containing date. This date can be : 2011, 2011-05, 2015-05-01. Trailing characters also can be slashes. I'd like to convert theses values into a valid date for Solr. So my question is : what is the best way to achieve this? 1) Use solr.DateField and make my own filter to that I get the date in the right format 2) Subclass solr.DateField ? http://wiki.apache.org/solr/UpdateRequestProcessor or http://wiki.apache.org/solr/DataImportHandler#Transformer if you are using DIH.
Re: Does the Solr enable Lemmatization [not the Stemming]
Rajani You might also want to look at Balie ( http://balie.sourceforge.net/ ), from the web site: Features: • language identification • tokenization • sentence boundary detection • named-entity recognition Can't vouch for it though. On May 5, 2011, at 4:58 AM, Jan Høydahl wrote: Hi, Solr does not have lemmatization out of the box. You'll have to find 3rd party analyzers, and the most known such is from BasisTech. Please contact them to learn more. I'm not aware of any open source lemmatizers for Solr. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 5. mai 2011, at 10.34, rajini maski wrote: Does the solr enable lemmatization concept? I found a documentation that gives an information as solr enables lemmatization concept. Here is the link : http://www.basistech.com/knowledge-center/search/2010-09-language-identification-language-support-and-entity-extraction.pdf Can anyone help me finding the jar specified in that document so that i can add it as plugin. jar :rlp.solr.RLPTokenizerFactory Thanks and Regards, Rajani Maski
[ann] Lily 1.0 is out: Smart Data at Scale, made Easy!
Hi all, We’re really proud to release the first official major release of Lily - our flagship repository for scalable data and content management, after 18 months of intense engineering work. We’re thrilled being first to launch the first open source, general-purpose, highly-scalable yet flexible data repository based on NOSQL/BigData technology: read all about it below. What Lily is a data and content repository made for the Age of Data: it allows you to store and manage vast amounts of data, and in the future will allow you to monetize user interactions by tracking and analyzing audience data. Lily makes Big Data easy with a high-level, developer-friendly data model with rich types, versioning and schema management. Lily offers simple Java and REST APIs for creating, reading and managing data. Its flexible indexing mechanism supports interactive and batch-oriented index maintenance. Lily is the foundation for any large-scale data-centric application: social media, e-commerce, large content management applications, product catalogs, archiving, media asset management: any data-centric application with an ambition to scale beyond a single-server setup. Lily is dead serious about Scale. The Lily repository has been tested to scale beyond any common content repository technology out there, due to its inherently distributed architecture, providing economically affordable, robust, and high-performing data management services for any kind of enterprise application. For whom Lily puts BigData technology within reach of enterprise and corporate developers, wrapping high-care leading-edge technology in a developer-and administrator-friendly package. Lily offers the flexibility and scalability of Apache HBase, the de-facto leading Google BigTable implementation, and the sophistication and robustness of Apache SOLR, the market leader of open source enterprise and internet search. Lily sits on the shoulders of these Big Data revolution leaders, and provides additional ease of use needed for corporate adoption. Thanks Lily builds further upon the best data and search technology out there: Apache HBase and SOLR. HBase is in use at some of the largest data properties out there: Facebook, StumbleUpon and Yahoo! SOLR is rapidly replacing proprietary enterprise search solutions all over the place and is one of the most popular open source projects at the Apache Software Foundation. We're thankful for the developer communities working hard on these projects, and strive hard to contribute back where possible. We're also appreciative of the commercial service suppliers backing these projects: Lucid Imagination and Cloudera. Where Everything Lily can be found at www.lilyproject.org. Enjoy! Thanks, The Lily team @ http://outerthought.org/ Outerthought Scalable Smart Data, made Easy Makers of Kauri, Daisy CMS and Lily
Programmatic restructuring of a Solr cloud
Dear Solr Experts, First of all, I would like to thank you for your patience when answering questions of those who are less experienced. And now to the main topic: I would like to learn whether it is possible to restructure a Solr cloud programmatically. Let me describe the system we are designing to make the requirements clear. The indexed documents are certain log entries. We are planning to shard them by month, and only keep the last 12 months in the index. We are going to replicate each shard across several servers. Now, the user is always required to search within a single month (= shard). Most importantly, we expect an absolute majority of the requests to query the current month, with only a minor load on the previous months. In order to utilise the cluster most efficiently, we would like a majority of the servers to contain replicas of the current month data, and have only one or two servers per older month. To this end, we are planning to have a set of slaves that migrate from master to master, depending on which master holds the data for the current month. When a new month starts, those slaves have to be reconfigured to hold the new shard and to replicate from the new master (their old master now holding the data for the previous month). Since this operation has to be done every month, we are naturally considering automating it. So my question is whether anyone has faced a similar problem before, and what is the best way to solve it. We are not committed to any solution, or even architecture, so feel free to propose different solutions. The only requirement is that a majority of the servers should be able to serve requests to the current month at any given moment. Thank you in advance for your answers. Best regards, Sergey Sazonov.
Re: why query chinese character with bracket become phrase query by default?
Unfortunately, the current out-of-the-box defaults (example config) for Solr are a disaster for non-whitespace languages (CJK, Thai, etc.), ie, exactly what you've hit. This is because Lucene's QueryParser can unexpectedly, dangerously, create PhraseQuery even when the user did not ask for it (auto phrase). Not only does this mean no results for non-whitespace languages, but it also means worse search performance (PhraseQuery is usually more costly than TermQuerys). Lucene leaves this auto phrase behavior off by default, but Solr defaults it to on. Robert's email gives a good description of how you can turn it off. The very first thing every non-whitespace language Solr app should do is turn off autoGeneratePhraseQueries! Mike http://blog.mikemccandless.com On Wed, May 4, 2011 at 8:21 PM, cyang2010 ysxsu...@hotmail.com wrote: Hi, In solr admin query full interface page, the following query with english become term query according to debug : title_en_US: (blood red) lst name=debug str name=rawquerystringtitle_en_US: (blood red)/str str name=querystringtitle_en_US: (blood red)/str str name=parsedquerytitle_en_US:blood title_en_US:red/str str name=parsedquery_toStringtitle_en_US:blood title_en_US:red/str However, using the same syntax with two chinese terms, the query result into a phrase query: title_zh_CN: (我活) lst name=debug str name=rawquerystringtitle_zh_CN: (我活)/str str name=querystringtitle_zh_CN: (我活)/str str name=parsedqueryPhraseQuery(title_zh_CN:我 活)/str str name=parsedquery_toStringtitle_zh_CN:我 活/str I do have different tokenizer/filter for those two different fields. title_en_US is using all those common english specific tokenizer, while title_zh_CN uses solr.ChineseTokenizerFactory. I don't think those tokenizer determin whether things within bracket become term queries or phrase queries. I really need to blindly pass user-input text to a solr field without doing any parsing, and hope it is all doing term query for each term contained in the search text. How do i achieve that? Thanks, cy -- View this message in context: http://lucene.472066.n3.nabble.com/why-query-chinese-character-with-bracket-become-phrase-query-by-default-tp2901542p2901542.html Sent from the Solr - User mailing list archive at Nabble.com.
How do I debug Unable to evaluate expression using this context printed at start?
I've tried to re-install solr on tomcat, and now when I launch tomcat in debug mode I see the following exception relating to solr. It's not enough to understand the problem (and fix it), but I don't know where to look for more (or what to do). Please help me. Following the tutorial and discussion here, this is my context descriptor (solr.xml): ?xml version=1.0 encoding=utf-8? Context docBase=/Users/simpatico/SOLR_HOME/dist/solr.war debug=0 crossContext=true Environment name=solr/home type=java.lang.String value=/Users/simpatico/SOLR_HOME override=true/ /Context (the war exists) $ ls $SOLR_HOME/dist/solr.war /Users/simpatico/SOLR_HOME//dist/solr.war $ ls $SOLR_HOME/conf/solrconfig.xml /Users/simpatico/SOLR_HOME//conf/solrconfig.xml When Tomcat starts: INFO: Using JNDI solr.home: /Users/simpatico/SOLR_HOME May 5, 2011 2:46:50 PM org.apache.solr.core.SolrResourceLoader init INFO: Solr home set to '/Users/simpatico/SOLR_HOME/' ... INFO: Adding 'file:/Users/simpatico/SOLR_HOME/lib/wstx-asl-3.2.7.jar' to classloader May 5, 2011 2:46:50 PM org.apache.solr.common.SolrException log SEVERE: *javax.xml.transform.TransformerException: Unable to evaluate expression using this context* at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:363) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:213) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275) at org.apache.solr.core.CoreContainer.readProperties(CoreContainer.java:303) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:242) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) at org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:273) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:254) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:372) at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:98) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4382) at org.apache.catalina.core.StandardContext$2.call(StandardContext.java:5040) at org.apache.catalina.core.StandardContext$2.call(StandardContext.java:5035) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) Caused by: java.lang.RuntimeException: Unable to evaluate expression using this context at com.sun.org.apache.xpath.internal.axes.NodeSequence.setRoot(NodeSequence.java:212) at com.sun.org.apache.xpath.internal.axes.LocPathIterator.execute(LocPathIterator.java:210) at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:335) ... 18 more - java.lang.RuntimeException: Unable to evaluate expression using this context at com.sun.org.apache.xpath.internal.axes.NodeSequence.setRoot(NodeSequence.java:212) at com.sun.org.apache.xpath.internal.axes.LocPathIterator.execute(LocPathIterator.java:210) at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:335) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:213) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275) at org.apache.solr.core.CoreContainer.readProperties(CoreContainer.java:303) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:242) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) at org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:273) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:254) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:372) at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:98) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4382) at org.apache.catalina.core.StandardContext$2.call(StandardContext.java:5040) at org.apache.catalina.core.StandardContext$2.call(StandardContext.java:5035) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) ---
Re: Programmatic restructuring of a Solr cloud
Hi, One approach if you're using Amazon is using BeanStalk * Create one master with 12 cores, named jan, feb, mar etc * Every month, you clear the current month index and switch indexing to it You will only have one master, because you're only indexing to one month at a time * For each of the 12 months, setup an Amazon BeanStalk instance with a Solr replica pointing to its master This way, Amazon will spin off replicas as needed NOTE: Your replica could still be located at /solr/select even if it replicates from /solr/may/replication * You only query the replicas, and the client will control whether to query one or more shards shards=jan.elasticbeanstalk.com/solr,feb.elasticbeanstalk.com/solr,mar.elasticbeanstalk.com/solr After this is setup, you have 0 config to worry about :) -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 5. mai 2011, at 14.03, Sergey Sazonov wrote: Dear Solr Experts, First of all, I would like to thank you for your patience when answering questions of those who are less experienced. And now to the main topic: I would like to learn whether it is possible to restructure a Solr cloud programmatically. Let me describe the system we are designing to make the requirements clear. The indexed documents are certain log entries. We are planning to shard them by month, and only keep the last 12 months in the index. We are going to replicate each shard across several servers. Now, the user is always required to search within a single month (= shard). Most importantly, we expect an absolute majority of the requests to query the current month, with only a minor load on the previous months. In order to utilise the cluster most efficiently, we would like a majority of the servers to contain replicas of the current month data, and have only one or two servers per older month. To this end, we are planning to have a set of slaves that migrate from master to master, depending on which master holds the data for the current month. When a new month starts, those slaves have to be reconfigured to hold the new shard and to replicate from the new master (their old master now holding the data for the previous month). Since this operation has to be done every month, we are naturally considering automating it. So my question is whether anyone has faced a similar problem before, and what is the best way to solve it. We are not committed to any solution, or even architecture, so feel free to propose different solutions. The only requirement is that a majority of the servers should be able to serve requests to the current month at any given moment. Thank you in advance for your answers. Best regards, Sergey Sazonov.
Controlling webapp startup
There are two ways to characterize what I'd like to do. 1) use the EmbeddedSolrServer to launch Solr, and subsequently enable the HTTP GET/json servlet. I can provide the 'servlet' wiring, I just need to be able to hand an HttpServletRequest to something and retrieve in return the same json that would come back from the usual Solr servlet. 2) Use the usual Solr servlet apparatus, but defer its startup until other code in the webapp makes up its mind about configuration and calls System.setProperty to locate the solr home and data directories.
fast case-insensitive autocomplete
Hi. I need an autocomplete solution to handle case-insensitive queries but return the original text with the case still intact. I've experimented with both the Suggester and TermComponent methods. TermComponent is working when I use the regex option, however, it is far to slow. I get the speed i want by using term.prefix for by using the suggester but it's case sensitive. Here is an example operating on a user directory: Query: bran Results: Branden Smith, Brandon Thompson, Brandon Verner, Brandy Finny, Brian Smith, ... A solution that I would expect to work would be to store two fields; one containing the original text and the other containing the lowercase. Then convert the query to lower case and run the query against the lower case field and return the original (case preserved) field. Unfortunately, I can't get a TermComponent query to return additional fields. It only returns the field it's searching against. Should this work or can I only return additional fields for standard queries. Thanks in advance, Brandyn
RE: Is it possible to build Solr as a maven project?
Hi Gabriele, The sequence should be 1. svn update 2. ant get-maven-poms 3. mvn -N -Pbootstrap install I think you left out #2 - there was a very recent change to the POMs that affects the noggit jar name. Steve -Original Message- From: Gabriele Kahlout [mailto:gabri...@mysimpatico.com] Sent: Thursday, May 05, 2011 1:22 AM To: solr-user@lucene.apache.org Subject: Re: Is it possible to build Solr as a maven project? Thank you so much for this gem, David! I still don't manage to build though: $ svn update At revision 1099684. $ mvn clean $ mvn -N -Pbootstrap install [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 8.234s [INFO] Finished at: Thu May 05 07:21:34 CEST 2011 [INFO] Final Memory: 12M/81M [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-install-plugin:2.3.1:install-file (install-solr-noggit) on project lucene-solr-grandparent: Error installing artifact 'org.apache.solr:solr-noggit:jar': Failed to install artifact org.apache.solr:solr-noggit:jar:4.0-SNAPSHOT: /Users/simpatico/debug/solr4/solr/lib/apache-solr-noggit-r944541.jar (No such file or directory) - [Help 1] On Thu, May 5, 2011 at 12:02 AM, Smiley, David W. dsmi...@mitre.org wrote: Hi folks. What you're supposed to do is run: mvn -N -Pbootstrap install as the very first one-time only step. It copies several custom jar files into your local repository. From then on you can build like normally with maven. ~ David Smiley Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/ On May 4, 2011, at 2:36 PM, Gabriele Kahlout wrote: but it doesn't build. Now, I've checked out solr4 from the trunk and tried to build the maven project there, but it fails downloading berkleydb: BUILD FAILURE - --- Total time: 1:07.367s Finished at: Wed May 04 20:33:29 CEST 2011 Final Memory: 24M/81M - --- Failed to execute goal on project lucene-bdb: Could not resolve dependencies for project org.apache.lucene:lucene-bdb:jar:4.0-SNAPSHOT: Failure to find com.sleepycat:berkeleydb:jar:4.7.25 in http://download.carrot2.org/maven2/was cached in the local repository, resolution will not be reattempted until the update interval of carrot2.org has elapsed or updates are forced - [Help 1] I looked up to get the jar on my own but I didn't find a 4.7.25 version, the latest on oracle website (java edition) is 4.1. Where can i download this maven dependency from? On Wed, May 4, 2011 at 1:26 PM, Gabriele Kahlout gabri...@mysimpatico.comwrote: It worked after checking out the dev-tools folder. Thank you! On Wed, May 4, 2011 at 1:20 PM, lboutros boutr...@gmail.com wrote: property name=version value=3.1-SNAPSHOT/ target name=get-maven-poms description=Copy Maven POMs from dev-tools/maven/ to their target locations copy todir=. overwrite=true fileset dir=${basedir}/dev-tools/maven/ filterset begintoken=@ endtoken=@ filter token=version value=${version}/ /filterset globmapper from=*.template to=*/ /copy /target -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't
Text Only Extraction Using Solr and Tika
Hi All, I have solr and tika installed and am happily extracting and indexing various files. Unfortunately on some word documents it blows up since it tries to auto-generate a 'title' field but my title field in the schema is single valued. Here is my config for the extract handler... requestHandler name=/update/extract class=org.apache.solr.handler.extraction.ExtractingRequestHandler lst name=defaults str name=uprefixignored_/str /lst /requestHandler Is there a config option to make it only extract text, or ideally to allow me to specify which metadata fields to accept ? E.g. I'd like to use any author metadata it finds but to not use any title metadata it finds as I want title to be single valued and set explicitly using a literal.title in the post request. I did look around for some docs but all i can find are very basic examples. there's no comprehensive configuration documentation out there as far as I can tell. ALSO... I get some other bad responses coming back such as... htmlheadtitleApache Tomcat/6.0.28 - Error report/titlestyle!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:# 525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;c olor:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--/style /headbodyh1HTTP Status 500 - org.ap ache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator; java.lang.NoSuchMethodError: org.apache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator; at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:168) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:148) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:636) /h1HR size=1 noshade=noshadepbtype/b Status report/ppbmessage/b uorg.apache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator; For the above my url was... http://localhost:8080/solr/update/extract?literal.id=3922defaultField=contentfmap.content=contentuprefix=ignored_stream.contentType=application%2Fvnd.ms-powerpointcommit=trueliteral.title=Reactor+cycle+141literal.not es=literal.tag=UCN_productionliteral.author=Maurits+van+der+Grinten I guess there's something special I need to be able to process power point files ? Maybe I need to get the latest apache POI ? Any suggestions welcome... Regards, Emyr
Re: Text Only Extraction Using Solr and Tika
Hi Emyr, You could try using the extractOnly=true parameter [1]. Of course, you'll need to repost the extracted text manually. --jay [1] http://wiki.apache.org/solr/ExtractingRequestHandler#Extract_Only On Thu, May 5, 2011 at 9:36 AM, Emyr James emyr.ja...@sussex.ac.uk wrote: Hi All, I have solr and tika installed and am happily extracting and indexing various files. Unfortunately on some word documents it blows up since it tries to auto-generate a 'title' field but my title field in the schema is single valued. Here is my config for the extract handler... requestHandler name=/update/extract class=org.apache.solr.handler.extraction.ExtractingRequestHandler lst name=defaults str name=uprefixignored_/str /lst /requestHandler Is there a config option to make it only extract text, or ideally to allow me to specify which metadata fields to accept ? E.g. I'd like to use any author metadata it finds but to not use any title metadata it finds as I want title to be single valued and set explicitly using a literal.title in the post request. I did look around for some docs but all i can find are very basic examples. there's no comprehensive configuration documentation out there as far as I can tell. ALSO... I get some other bad responses coming back such as... htmlheadtitleApache Tomcat/6.0.28 - Error report/titlestyle!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:# 525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;c olor:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--/style /headbodyh1HTTP Status 500 - org.ap ache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator; java.lang.NoSuchMethodError: org.apache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator; at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:168) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:148) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:636) /h1HR size=1 noshade=noshadepbtype/b Status report/ppbmessage/b uorg.apache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator; For the above my url was... http://localhost:8080/solr/update/extract?literal.id=3922defaultField=contentfmap.content=contentuprefix=ignored_stream.contentType=application%2Fvnd.ms-powerpointcommit=trueliteral.title=Reactor+cycle+141literal.not es=literal.tag=UCN_productionliteral.author=Maurits+van+der+Grinten I guess there's something special I need to be able to process power point files ? Maybe I need to get the latest apache POI ? Any suggestions welcome... Regards, Emyr
Re: Is it possible to build Solr as a maven project?
Okay, that sequence worked, but then shouldn't I be able to do $ mvn install afterwards? This is what I get: ... Compiling 478 source files to /Users/simpatico/debug/solr4/solr/build/solr - COMPILATION ERROR : - org/apache/solr/spelling/suggest/fst/InputStreamDataInput.java:[7,27] package com.google.common.io does not exist org/apache/solr/spelling/suggest/fst/FSTLookup.java:[28,32] package com.google.common.collect does not exist org/apache/solr/spelling/suggest/fst/FSTLookup.java:[29,27] package com.google.common.io does not exist org/apache/solr/spelling/suggest/fst/InputStreamDataInput.java:[29,4] cannot find symbol symbol : variable ByteStreams location: class org.apache.solr.spelling.suggest.fst.InputStreamDataInput org/apache/solr/spelling/suggest/fst/FSTLookup.java:[128,57] cannot find symbol symbol : variable Lists location: class org.apache.solr.spelling.suggest.fst.FSTLookup org/apache/solr/spelling/suggest/fst/FSTLookup.java:[170,26] cannot find symbol symbol : variable Lists location: class org.apache.solr.spelling.suggest.fst.FSTLookup org/apache/solr/spelling/suggest/fst/FSTLookup.java:[203,35] cannot find symbol symbol : variable Lists location: class org.apache.solr.spelling.suggest.fst.FSTLookup org/apache/solr/spelling/suggest/fst/FSTLookup.java:[529,6] cannot find symbol symbol : variable Closeables location: class org.apache.solr.spelling.suggest.fst.FSTLookup org/apache/solr/spelling/suggest/fst/FSTLookup.java:[551,6] cannot find symbol symbol : variable Closeables location: class org.apache.solr.spelling.suggest.fst.FSTLookup 9 errors - Reactor Summary: Grandparent POM for Apache Lucene Java and Apache Solr SUCCESS [13.255s] Lucene parent POM . SUCCESS [0.199s] Lucene Core ... SUCCESS [15.528s] Lucene Test Framework . SUCCESS [4.657s] Lucene Common Analyzers ... SUCCESS [16.770s] Lucene Contrib Ant SUCCESS [1.103s] Lucene Contrib bdb SUCCESS [0.883s] Lucene Contrib bdb-je . SUCCESS [0.872s] Lucene Database aggregator POM SUCCESS [0.091s] Lucene Demo ... SUCCESS [0.842s] Lucene Memory . SUCCESS [0.726s] Lucene Queries SUCCESS [1.559s] Lucene Highlighter SUCCESS [3.007s] Lucene InstantiatedIndex .. SUCCESS [1.224s] Lucene Lucli .. SUCCESS [1.579s] Lucene Miscellaneous .. SUCCESS [1.163s] Lucene Query Parser ... SUCCESS [4.274s] Lucene Spatial SUCCESS [1.159s] Lucene Spellchecker ... SUCCESS [0.841s] Lucene Swing .. SUCCESS [1.177s] Lucene Wordnet SUCCESS [0.816s] Lucene XML Query Parser ... SUCCESS [1.197s] Lucene Contrib aggregator POM . SUCCESS [0.079s] Lucene ICU Analysis Components SUCCESS [1.494s] Lucene Phonetic Filters ... SUCCESS [0.759s] Lucene Smart Chinese Analyzer . SUCCESS [3.534s] Lucene Stempel Analyzer ... SUCCESS [1.537s] Lucene Analysis Modules aggregator POM SUCCESS [0.081s] Lucene Benchmark .. SUCCESS [3.693s] Lucene Modules aggregator POM . SUCCESS [0.147s] Apache Solr parent POM SUCCESS [0.099s] Apache Solr Solrj . SUCCESS [3.670s] Apache Solr Core .. FAILURE [7.842s] On Thu, May 5, 2011 at 3:36 PM, Steven A Rowe sar...@syr.edu wrote: Hi Gabriele, The sequence should be 1. svn update 2. ant get-maven-poms 3. mvn -N -Pbootstrap install I think you left out #2 - there was a very recent change to the POMs that affects the noggit jar name. Steve -Original Message- From: Gabriele Kahlout [mailto:gabri...@mysimpatico.com] Sent: Thursday, May 05, 2011 1:22 AM To: solr-user@lucene.apache.org Subject: Re: Is it possible to build Solr as a maven project? Thank you so much for this gem, David! I still don't manage to build though: $ svn update At revision 1099684. $ mvn clean $ mvn -N -Pbootstrap install [INFO] [INFO] BUILD FAILURE [INFO]
Re: Text Only Extraction Using Solr and Tika
Thanks for the suggestion but there surely must be a better way than that to do it ? I don't want to post the whole file up, get it extracted on the server, send the extracted text back to the client then send it all back up to the server again as plain text. On 05/05/11 14:55, Jay Luker wrote: Hi Emyr, You could try using the extractOnly=true parameter [1]. Of course, you'll need to repost the extracted text manually. --jay [1] http://wiki.apache.org/solr/ExtractingRequestHandler#Extract_Only On Thu, May 5, 2011 at 9:36 AM, Emyr Jamesemyr.ja...@sussex.ac.uk wrote: Hi All, I have solr and tika installed and am happily extracting and indexing various files. Unfortunately on some word documents it blows up since it tries to auto-generate a 'title' field but my title field in the schema is single valued. Here is my config for the extract handler... requestHandler name=/update/extract class=org.apache.solr.handler.extraction.ExtractingRequestHandler lst name=defaults str name=uprefixignored_/str /lst /requestHandler Is there a config option to make it only extract text, or ideally to allow me to specify which metadata fields to accept ? E.g. I'd like to use any author metadata it finds but to not use any title metadata it finds as I want title to be single valued and set explicitly using a literal.title in the post request. I did look around for some docs but all i can find are very basic examples. there's no comprehensive configuration documentation out there as far as I can tell. ALSO... I get some other bad responses coming back such as... htmlheadtitleApache Tomcat/6.0.28 - Error report/titlestyle!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:# 525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;c olor:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--/style /headbodyh1HTTP Status 500 - org.ap ache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator; java.lang.NoSuchMethodError: org.apache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator; at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:168) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:148) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:636) /h1HR size=1 noshade=noshadepbtype/b Status report/ppbmessage/b uorg.apache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator; For the above my url was... http://localhost:8080/solr/update/extract?literal.id=3922defaultField=contentfmap.content=contentuprefix=ignored_stream.contentType=application%2Fvnd.ms-powerpointcommit=trueliteral.title=Reactor+cycle+141literal.not
Re: why query chinese character with bracket become phrase query by default?
2011/5/5 Michael McCandless luc...@mikemccandless.com: The very first thing every non-whitespace language Solr app should do is turn off autoGeneratePhraseQueries! Luckily, this is configurable per FieldType... so if it doesn't exist yet, we should come up with a good CJK fieldtype to add to the example schema. -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco
Re: Text Only Extraction Using Solr and Tika
Hi Emyr, You can try the XPath based approach and see if that works. Also, see if dynamic fields can help you for the meta data fields. References- http://wiki.apache.org/solr/SchemaXml#Dynamic_fields http://wiki.apache.org/solr/ExtractingRequestHandler#Input_Parameters http://wiki.apache.org/solr/TikaExtractOnlyExampleOutput Regards, Anuj On Thu, May 5, 2011 at 7:28 PM, Emyr James emyr.ja...@sussex.ac.uk wrote: Thanks for the suggestion but there surely must be a better way than that to do it ? I don't want to post the whole file up, get it extracted on the server, send the extracted text back to the client then send it all back up to the server again as plain text. On 05/05/11 14:55, Jay Luker wrote: Hi Emyr, You could try using the extractOnly=true parameter [1]. Of course, you'll need to repost the extracted text manually. --jay [1] http://wiki.apache.org/solr/ExtractingRequestHandler#Extract_Only On Thu, May 5, 2011 at 9:36 AM, Emyr Jamesemyr.ja...@sussex.ac.uk wrote: Hi All, I have solr and tika installed and am happily extracting and indexing various files. Unfortunately on some word documents it blows up since it tries to auto-generate a 'title' field but my title field in the schema is single valued. Here is my config for the extract handler... requestHandler name=/update/extract class=org.apache.solr.handler.extraction.ExtractingRequestHandler lst name=defaults str name=uprefixignored_/str /lst /requestHandler Is there a config option to make it only extract text, or ideally to allow me to specify which metadata fields to accept ? E.g. I'd like to use any author metadata it finds but to not use any title metadata it finds as I want title to be single valued and set explicitly using a literal.title in the post request. I did look around for some docs but all i can find are very basic examples. there's no comprehensive configuration documentation out there as far as I can tell. ALSO... I get some other bad responses coming back such as... htmlheadtitleApache Tomcat/6.0.28 - Error report/titlestyle!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:# 525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;c olor:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--/style /headbodyh1HTTP Status 500 - org.ap ache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator; java.lang.NoSuchMethodError: org.apache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator; at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:168) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:148) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at
Re: Field names with a period (.)
Thanks Gora! [ ]'s Leonardo da S. Souza °v° Linux user #375225 /(_)\ http://counter.li.org/ ^ ^ On Thu, May 5, 2011 at 3:09 AM, Gora Mohanty g...@mimirtech.com wrote: On Thu, May 5, 2011 at 5:08 AM, Leonardo Souza leonardo...@gmail.com wrote: Hi guys, Can i have a field name with a period(.) ? Like in *file.size* Cannot find now where this is documented, but from what I remember it is recommended to use only characters A-Z, a-z, 0-9, and underscore (_) in field names, and some special characters are known to cause problems. Regards, Gora
RE: Patch problems solr 1.4 - solr-2010
There is still a functionality gap in Solr's spellchecker even with Solr-2010 applied. If a user enters a word that is in the dictionary, solr will never try to correct it. The only way around this is to use spellcheck.onlyMorePopular. The problem with this approach is onlyMorePopular causes the spellchecker to assume *every* word in the query is a misspelling and it won't even consider the original terms in building collations. What is needed is a hybrid option that will try to build collations using combinations of original terms, corrected terms and more popular terms. To my knowledge, there is no way to get the spellchecker to do that currently. On the other hand, if you're pretty sure man is not in the dictionary, try upping spellcheck.count to something higher than the default (20 maybe?)... James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: roySolr [mailto:royrutten1...@gmail.com] Sent: Thursday, May 05, 2011 3:24 AM To: solr-user@lucene.apache.org Subject: Re: Patch problems solr 1.4 - solr-2010 Hello, thanks for the answers, i use branch 1.4 and i have succesfully patch solr-2010. Now i want to use the collate spellchecking. How does my url look like. I tried this but it's not working(It's the same as solr without solr-2010). http://localhost:8983/solr/select?q=man unitetspellcheck.q=man unitetspellcheck=truespellcheck.build=truespellcheck.collate=truespellcheck.collateExtendedResult=truespellcheck.maxCollations=10spellcheck.maxCollationTries=10 I get the collapse man united as suggestion. Man is good spelled, but not in this phrase. It must be manchester united and i want that solr requerying the collapse and only give the suggestion if it gives some results. How can i fix this?? -- View this message in context: http://lucene.472066.n3.nabble.com/Patch-problems-solr-1-4-solr-2010-tp2898443p2902546.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Highlighting words with non-ascii chars
Thanks for the suggestion, Peter; the problem was elsewhere though - somewhere in the highlighting module. I've fixed it by adding (into the field definition in schema.xml) a custom czech charFilter (mappings from í = i) - then it started to work as expected. Cheers, Pavel Peter Wolanin píše v Po 02. 05. 2011 v 17:38 +0200: Does your servlet container have the URI encoding set correctly, e.g. URIEncoding=UTF-8 for tomcat6? http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config Older versions of Jetty use ISO-8859-1 as the default URI encoding, but jetty 6 should use UTF-8 as default: http://docs.codehaus.org/display/JETTY/International+Characters+and+Character+Encodings -Peter On Sat, Apr 30, 2011 at 6:31 AM, Pavel Kukačka pavel.kuka...@seznam.cz wrote: Hello, I've hit a (probably trivial) roadblock I don't know how to overcome with Solr 3.1: I have a document with common fields (title, keywords, content) and I'm trying to use highlighting. With queries using ASCII characters there is no problem; it works smoothly. However, when I search using a czech word including non-ascii chars (like slovíčko for example - http://localhost:8983/solr/select/?q=slov%C3%AD%C4%8Dkoversion=2.2start=0rows=10indent=onhl=onhl.fl=*), the document is found, but the response doesn't contain the highlighted snippet in the highlighting node - there is only an empty node - like this: ** . . . lst name=highlighting lst name=2009/ /lst When searching for the other keyword ( http://localhost:8983/solr/select/?q=slovoversion=2.2start=0rows=10indent=onhl=onhl.fl=*), the resulting response is fine - like this: lst name=highlighting lst name=2009 arr name=user_keywords strslovamp;#237;amp;#269;ko lt;em id=highlightinggt;slovolt;/emgt;/str /arr /lst /lst Did anyone come accross this problem? Cheers, Pavel
Re: Text Only Extraction Using Solr and Tika
Hi, I'm not really sure how these can help with my problem. Can you give a bit more info on this ? I think what i'm after is a fairly common request.. http://lucene.472066.n3.nabble.com/Controlling-Tika-s-metadata-td2378677.html http://lucene.472066.n3.nabble.com/Select-tika-output-for-extract-only-td499059.html#a499062 Did the change that Yonik Seely mentions to allow more control over the output ever make it into 1.4 ? Regards, Emyr On 05/05/11 15:01, Anuj Kumar wrote: Hi Emyr, You can try the XPath based approach and see if that works. Also, see if dynamic fields can help you for the meta data fields. References- http://wiki.apache.org/solr/SchemaXml#Dynamic_fields http://wiki.apache.org/solr/ExtractingRequestHandler#Input_Parameters http://wiki.apache.org/solr/TikaExtractOnlyExampleOutput Regards, Anuj On Thu, May 5, 2011 at 7:28 PM, Emyr Jamesemyr.ja...@sussex.ac.uk wrote: Thanks for the suggestion but there surely must be a better way than that to do it ? I don't want to post the whole file up, get it extracted on the server, send the extracted text back to the client then send it all back up to the server again as plain text. On 05/05/11 14:55, Jay Luker wrote: Hi Emyr, You could try using the extractOnly=true parameter [1]. Of course, you'll need to repost the extracted text manually. --jay [1] http://wiki.apache.org/solr/ExtractingRequestHandler#Extract_Only On Thu, May 5, 2011 at 9:36 AM, Emyr Jamesemyr.ja...@sussex.ac.uk wrote: Hi All, I have solr and tika installed and am happily extracting and indexing various files. Unfortunately on some word documents it blows up since it tries to auto-generate a 'title' field but my title field in the schema is single valued. Here is my config for the extract handler... requestHandler name=/update/extract class=org.apache.solr.handler.extraction.ExtractingRequestHandler lst name=defaults str name=uprefixignored_/str /lst /requestHandler Is there a config option to make it only extract text, or ideally to allow me to specify which metadata fields to accept ? E.g. I'd like to use any author metadata it finds but to not use any title metadata it finds as I want title to be single valued and set explicitly using a literal.title in the post request. I did look around for some docs but all i can find are very basic examples. there's no comprehensive configuration documentation out there as far as I can tell. ALSO... I get some other bad responses coming back such as... htmlheadtitleApache Tomcat/6.0.28 - Error report/titlestyle!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:# 525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;c olor:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--/style /headbodyh1HTTP Status 500 - org.ap ache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator; java.lang.NoSuchMethodError: org.apache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator; at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:168) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:148) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
Re: Text Only Extraction Using Solr and Tika
Hey Emyr, Looking at your stack trace below my guess is that you have two conflicting Apache POI jars in your classpath. The odd stack trace is indicative of that as the class loader is likely loading some other version of the DirectoryNode class that doesn't have the iterator method. java.lang.NoSuchMethodError: org.apache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator; Thanks, Paul Ramirez On May 5, 2011, at 6:36 AM, Emyr James wrote: Hi All, I have solr and tika installed and am happily extracting and indexing various files. Unfortunately on some word documents it blows up since it tries to auto-generate a 'title' field but my title field in the schema is single valued. Here is my config for the extract handler... requestHandler name=/update/extract class=org.apache.solr.handler.extraction.ExtractingRequestHandler lst name=defaults str name=uprefixignored_/str /lst /requestHandler Is there a config option to make it only extract text, or ideally to allow me to specify which metadata fields to accept ? E.g. I'd like to use any author metadata it finds but to not use any title metadata it finds as I want title to be single valued and set explicitly using a literal.title in the post request. I did look around for some docs but all i can find are very basic examples. there's no comprehensive configuration documentation out there as far as I can tell. ALSO... I get some other bad responses coming back such as... htmlheadtitleApache Tomcat/6.0.28 - Error report/titlestyle!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:# 525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;c olor:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--/style /headbodyh1HTTP Status 500 - org.ap ache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator; java.lang.NoSuchMethodError: org.apache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator; at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:168) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:148) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:636) /h1HR size=1 noshade=noshadepbtype/b Status report/ppbmessage/b uorg.apache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator; For the above my url was... http://localhost:8080/solr/update/extract?literal.id=3922defaultField=contentfmap.content=contentuprefix=ignored_stream.contentType=application%2Fvnd.ms-powerpointcommit=trueliteral.title=Reactor+cycle+141literal.not
Re: UIMA analysisEngine path
Tommaso, Thanks. Now Solr finds the descriptor; however, I think this is very bad practice. Descriptors really aren't meant to be jarred up. They often contain relative paths. For example, in my case I have a directory that looks like: appassemble |- desc |- pear where the AnalysisEngine descriptor contained in desc is an aggregate analysis engine and refers to other analysis engines packaged as installed PEAR files in the pear subdirectory. As such, the descriptor contains relative paths pointing into the pear subdirectory. Grabbing the descriptor from the jar breaks that since OverridingParamsAEProvider uses the XMLInputSource method without relative path signature. Barry On 5/4/2011 6:16 AM, Tommaso Teofili wrote: Hello Barry, the main AnalysisEngine descriptor defined inside theanalysisEngine element should be inside one of the jars imported with thelib elements. At the moment it cannot be taken from expanded directories but it should be easy to do it (and indeed useful) modifying the OverridingParamsAEProvider class [1] at line 57. Hope this helps, Tommaso [1] : http://svn.apache.org/viewvc/lucene/dev/tags/lucene_solr_3_1/solr/contrib/uima/src/main/java/org/apache/solr/uima/processor/ae/OverridingParamsAEProvider.java?view=markup 2011/5/3 Barry Hathawaybhath...@nycap.rr.com I'm new to Solr and trying to get it call a UIMA aggregate analysis engine and not having much luck. The null pointer exception indicates that it can't find the xml file associated with the engine. I have tried a number of combinations of a path in theanalysisEngine element, but nothing seems to work. In addition, I've put the directory containing the descriptor in both the classpath when starting the server and in alib element in solrconfig.xml. So: What classpath does theanalysisEngine tag effectively search for to locate the descriptor? Do thelib entries in solrconfig.xml affect this classpath? Do the engine descriptors have to be in a jar or can they be in an expanded directory? Thanks in advance. Barry
Re: How do I debug Unable to evaluate expression using this context printed at start?
While the question remains valid, I found there reason to my problem. Backing up I had saved Tomcat's descriptor file in my $SOLR_HOME and Solr was trying to read it as described in SolrCore Wikihttp://wiki.apache.org/solr/CoreAdmin . What saved me was remembering Chris's earlier remarkhttp://markmail.org/thread/3y4zqieyjqfi5vl3. Thank you Chris! On Thu, May 5, 2011 at 2:58 PM, Gabriele Kahlout gabri...@mysimpatico.comwrote: I've tried to re-install solr on tomcat, and now when I launch tomcat in debug mode I see the following exception relating to solr. It's not enough to understand the problem (and fix it), but I don't know where to look for more (or what to do). Please help me. Following the tutorial and discussion here, this is my context descriptor (solr.xml): ?xml version=1.0 encoding=utf-8? Context docBase=/Users/simpatico/SOLR_HOME/dist/solr.war debug=0 crossContext=true Environment name=solr/home type=java.lang.String value=/Users/simpatico/SOLR_HOME override=true/ /Context (the war exists) $ ls $SOLR_HOME/dist/solr.war /Users/simpatico/SOLR_HOME//dist/solr.war $ ls $SOLR_HOME/conf/solrconfig.xml /Users/simpatico/SOLR_HOME//conf/solrconfig.xml When Tomcat starts: INFO: Using JNDI solr.home: /Users/simpatico/SOLR_HOME May 5, 2011 2:46:50 PM org.apache.solr.core.SolrResourceLoader init INFO: Solr home set to '/Users/simpatico/SOLR_HOME/' ... INFO: Adding 'file:/Users/simpatico/SOLR_HOME/lib/wstx-asl-3.2.7.jar' to classloader May 5, 2011 2:46:50 PM org.apache.solr.common.SolrException log SEVERE: *javax.xml.transform.TransformerException: Unable to evaluate expression using this context* at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:363) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:213) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275) at org.apache.solr.core.CoreContainer.readProperties(CoreContainer.java:303) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:242) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) at org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:273) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:254) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:372) at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:98) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4382) at org.apache.catalina.core.StandardContext$2.call(StandardContext.java:5040) at org.apache.catalina.core.StandardContext$2.call(StandardContext.java:5035) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) Caused by: java.lang.RuntimeException: Unable to evaluate expression using this context at com.sun.org.apache.xpath.internal.axes.NodeSequence.setRoot(NodeSequence.java:212) at com.sun.org.apache.xpath.internal.axes.LocPathIterator.execute(LocPathIterator.java:210) at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:335) ... 18 more - java.lang.RuntimeException: Unable to evaluate expression using this context at com.sun.org.apache.xpath.internal.axes.NodeSequence.setRoot(NodeSequence.java:212) at com.sun.org.apache.xpath.internal.axes.LocPathIterator.execute(LocPathIterator.java:210) at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:335) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:213) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275) at org.apache.solr.core.CoreContainer.readProperties(CoreContainer.java:303) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:242) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) at org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:273) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:254) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:372) at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:98) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4382) at
SpellCheckComponent issue
Hi, (Sorry, emailing again because the last post was not posted...) I have been using using SolrSpellCheckcomponent. One of my requirements is that if a user types something like add, solr would return adidas. To get something like this, I used EdgeNGramsFilterFactory and applied it to the fields that I am indexing. So for adidas I will have something like a, ad, adi, adid... Correct me if I'm wrong, shouldnt the distance algorithm used internally, match adidas with this approach? Thanks, Sid
Re: fast case-insensitive autocomplete
Hi, Try this solution using a Solr core: http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 5. mai 2011, at 15.22, Kusenda, Brandyn J wrote: Hi. I need an autocomplete solution to handle case-insensitive queries but return the original text with the case still intact. I've experimented with both the Suggester and TermComponent methods. TermComponent is working when I use the regex option, however, it is far to slow. I get the speed i want by using term.prefix for by using the suggester but it's case sensitive. Here is an example operating on a user directory: Query: bran Results: Branden Smith, Brandon Thompson, Brandon Verner, Brandy Finny, Brian Smith, ... A solution that I would expect to work would be to store two fields; one containing the original text and the other containing the lowercase. Then convert the query to lower case and run the query against the lower case field and return the original (case preserved) field. Unfortunately, I can't get a TermComponent query to return additional fields. It only returns the field it's searching against. Should this work or can I only return additional fields for standard queries. Thanks in advance, Brandyn
RE: Is it possible to build Solr as a maven project?
Hi Gabriele, On 5/5/2011 at 9:57 AM, Gabriele Kahlout wrote: Okay, that sequence worked, but then shouldn't I be able to do $ mvn install afterwards? This is what I get: ... COMPILATION ERROR : - org/apache/solr/spelling/suggest/fst/InputStreamDataInput.java:[7,27] package com.google.common.io does not exist org/apache/solr/spelling/suggest/fst/FSTLookup.java:[28,32] package com.google.common.collect does not exist ... mvn install should work, but it doesn't - I can reproduce this error on my machine. This is a bug in the Maven build. The nightly Lucene/Solr Maven build on Jenkins should have caught this compilation failure three weeks ago, when Dawid Weiss committed his work under https://issues.apache.org/jira/browse/SOLR-2378. Unfortunately, the nightly builds were using the results of compilation under the Ant build, rather than compiling from scratch. I have committed a fix to the nightly build script so this won't happen again. The Maven build bug is that the Solr-core Google Guava dependency was scoped as test-only. Until SOLR-2378, that was true, but it is no longer. So the fix is simply to remove scopetest/scope from the dependency declaration in the Solr-core POM. I've committed this too. If you svn update you will get these two fixes. Thank you very much for persisting, and reporting the problems you have encountered. Steve
Re: apache-solr-3.1 slow stats component queries
Hi, I bench-marked the slow stats queries (6 point estimate) using the same hardware on an index of size 104M. We use a Solr/Lucene 3.1-mod which returns only the sum and count for statistics component results. Solr/Lucene is run on jetty. The relationship between query time and set of found documents is linear when using the stats component (R^2 0.99). I guess this is expected as the application needs to scan/sum-up the stat field for all matching documents? Are there any plans for caching stat results for a certain stat field along with the documents that match a filter query ? Any other ideas that could help to improve this (hardware/software configuration) ? Even for a subset of 10M entries, the stat search takes on the order of 10 seconds. Thanks in advance. Johannes 2011/4/18 Johannes Goll johannes.g...@gmail.com any ideas why in this case the stats summaries are so slow ? Thank you very much in advance for any ideas/suggestions. Johannes 2011/4/5 Johannes Goll johannes.g...@gmail.com Hi, thank you for making the new apache-solr-3.1 available. I have installed the version from http://apache.tradebit.com/pub//lucene/solr/3.1.0/ and am running into very slow stats component queries (~ 1 minute) for fetching the computed sum of the stats field url: ?q=*:*start=0rows=0stats=truestats.field=weight int name=QTime52825/int #documents: 78,359,699 total RAM: 256G vm arguments: -server -xmx40G the stats.field specification is as follows: field name=weighttype=pfloatindexed=true stored=false required=true multiValued=false default=1/ filter queries that narrow down the #docs help to reduce it - QTime seems to be proportional to the number of docs being returned by a filter query. Is there any way to improve the performance of such stats queries ? Caching only helped to improve the filter query performance but if larger subsets are being returned, QTime increases unacceptably. Since I only need the sum and not the STD or sumsOfSquares/Min/Max, I have created a custom 3.1 version that does only return the sum. But this only slightly improved the performance. Of course I could somehow cache the larger sum queries on the client side but I want to do this only as a last resort. Thank you very much in advance for any ideas/suggestions. Johannes -- Johannes Goll 211 Curry Ford Lane Gaithersburg, Maryland 20878
Re: Is it possible to build Solr as a maven project?
Thanks Steve, this will be really simpler next time :) Is it documented somewhere ? If no, perhaps could we add something in this page for example ? http://wiki.apache.org/solr/FrontPage#Solr_Development or here : http://wiki.apache.org/solr/NightlyBuilds Ludovic. 2011/5/5 steve_rowe [via Lucene] ml-node+2904178-33932273-383...@n3.nabble.com Hi Gabriele, On 5/5/2011 at 9:57 AM, Gabriele Kahlout wrote: Okay, that sequence worked, but then shouldn't I be able to do $ mvn install afterwards? This is what I get: ... COMPILATION ERROR : - org/apache/solr/spelling/suggest/fst/InputStreamDataInput.java:[7,27] package com.google.common.io does not exist org/apache/solr/spelling/suggest/fst/FSTLookup.java:[28,32] package com.google.common.collect does not exist ... mvn install should work, but it doesn't - I can reproduce this error on my machine. This is a bug in the Maven build. The nightly Lucene/Solr Maven build on Jenkins should have caught this compilation failure three weeks ago, when Dawid Weiss committed his work under https://issues.apache.org/jira/browse/SOLR-2378. Unfortunately, the nightly builds were using the results of compilation under the Ant build, rather than compiling from scratch. I have committed a fix to the nightly build script so this won't happen again. The Maven build bug is that the Solr-core Google Guava dependency was scoped as test-only. Until SOLR-2378, that was true, but it is no longer. So the fix is simply to remove scopetest/scope from the dependency declaration in the Solr-core POM. I've committed this too. If you svn update you will get these two fixes. Thank you very much for persisting, and reporting the problems you have encountered. Steve -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Is-it-possible-to-build-Solr-as-a-maven-project-tp2898068p2904178.html To start a new topic under Solr - User, email ml-node+472068-1765922688-383...@n3.nabble.com To unsubscribe from Solr - User, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472068code=Ym91dHJvc2xAZ21haWwuY29tfDQ3MjA2OHw0Mzk2MDUxNjE=. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Is-it-possible-to-build-Solr-as-a-maven-project-tp2898068p2904375.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is it possible to build Solr as a maven project?
Steven, thank you! $ mvn -DskipTests=true install works! [INFO] Reactor Summary: [INFO] [INFO] Grandparent POM for Apache Lucene Java and Apache Solr SUCCESS [13.142s] [INFO] Lucene parent POM . SUCCESS [0.345s] [INFO] Lucene Core ... SUCCESS [18.448s] [INFO] Lucene Test Framework . SUCCESS [3.560s] [INFO] Lucene Common Analyzers ... SUCCESS [7.739s] [INFO] Lucene Contrib Ant SUCCESS [1.265s] [INFO] Lucene Contrib bdb SUCCESS [1.332s] [INFO] Lucene Contrib bdb-je . SUCCESS [1.321s] [INFO] Lucene Database aggregator POM SUCCESS [0.242s] [INFO] Lucene Demo ... SUCCESS [1.813s] [INFO] Lucene Memory . SUCCESS [2.412s] [INFO] Lucene Queries SUCCESS [2.275s] [INFO] Lucene Highlighter SUCCESS [2.985s] [INFO] Lucene InstantiatedIndex .. SUCCESS [2.170s] [INFO] Lucene Lucli .. SUCCESS [1.814s] [INFO] Lucene Miscellaneous .. SUCCESS [1.998s] [INFO] Lucene Query Parser ... SUCCESS [2.755s] [INFO] Lucene Spatial SUCCESS [1.314s] [INFO] Lucene Spellchecker ... SUCCESS [1.535s] [INFO] Lucene Swing .. SUCCESS [1.233s] [INFO] Lucene Wordnet SUCCESS [1.309s] [INFO] Lucene XML Query Parser ... SUCCESS [1.483s] [INFO] Lucene Contrib aggregator POM . SUCCESS [0.151s] [INFO] Lucene ICU Analysis Components SUCCESS [2.728s] [INFO] Lucene Phonetic Filters ... SUCCESS [1.765s] [INFO] Lucene Smart Chinese Analyzer . SUCCESS [3.709s] [INFO] Lucene Stempel Analyzer ... SUCCESS [4.241s] [INFO] Lucene Analysis Modules aggregator POM SUCCESS [0.213s] [INFO] Lucene Benchmark .. SUCCESS [2.926s] [INFO] Lucene Modules aggregator POM . SUCCESS [0.307s] [INFO] Apache Solr parent POM SUCCESS [0.233s] [INFO] Apache Solr Solrj . SUCCESS [3.780s] [INFO] Apache Solr Core .. SUCCESS [9.693s] [INFO] Apache Solr Search Server . SUCCESS [6.739s] [INFO] Apache Solr Test Framework SUCCESS [2.699s] [INFO] Apache Solr Analysis Extras ... SUCCESS [3.868s] [INFO] Apache Solr Clustering SUCCESS [6.736s] [INFO] Apache Solr DataImportHandler . SUCCESS [4.914s] [INFO] Apache Solr DataImportHandler Extras .. SUCCESS [2.721s] [INFO] Apache Solr DataImportHandler aggregator POM .. SUCCESS [0.253s] [INFO] Apache Solr Content Extraction Library SUCCESS [1.909s] [INFO] Apache Solr - UIMA integration SUCCESS [1.922s] [INFO] Apache Solr Contrib aggregator POM SUCCESS [0.211s] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 2:18.040s [INFO] Finished at: Thu May 05 20:39:09 CEST 2011 [INFO] Final Memory: 38M/90M [INFO] On Thu, May 5, 2011 at 6:53 PM, Steven A Rowe sar...@syr.edu wrote: Hi Gabriele, On 5/5/2011 at 9:57 AM, Gabriele Kahlout wrote: Okay, that sequence worked, but then shouldn't I be able to do $ mvn install afterwards? This is what I get: ... COMPILATION ERROR : - org/apache/solr/spelling/suggest/fst/InputStreamDataInput.java:[7,27] package com.google.common.io does not exist org/apache/solr/spelling/suggest/fst/FSTLookup.java:[28,32] package com.google.common.collect does not exist ... mvn install should work, but it doesn't - I can reproduce this error on my machine. This is a bug in the Maven build. The nightly Lucene/Solr Maven build on Jenkins should have caught this compilation failure three weeks ago, when Dawid Weiss committed his work under https://issues.apache.org/jira/browse/SOLR-2378. Unfortunately, the nightly builds were using the results of compilation under the Ant build, rather than compiling from scratch. I have committed a fix to the nightly build script so this won't happen again. The Maven build bug is that the Solr-core Google Guava dependency was scoped as test-only. Until SOLR-2378, that was true, but it is no longer. So
OverlappingFileLockException when concurrent commits in solr
Hello, I'm using solr version 1.4.0 with tomcat 6. I've 2 solr instances running as 2 different web apps with separate data folders. My application requires frequent commits from multiple clients. I've noticed that when more than one client try to commit at the same time, these OverlappingFileLockException start to appear. Can anything be done to rectify this problem? Please find the error log below. Thanks --- HTTP Status 500 - null java.nio.channels.OverlappingFileLockException at sun.nio.ch.FileChannelImpl$SharedFileLockTable.checkList(FileChannelImpl.java:1215) at sun.nio.ch.FileChannelImpl$SharedFileLockTable.add(FileChannelImpl.java:1117) at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:923) at java.nio.channels.FileChannel.tryLock(FileChannel.java:978) at org.apache.lucene.store.NativeFSLock.obtain(NativeFSLockFactory.java:233) at org.apache.lucene.store.Lock.obtain(Lock.java:73) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1550) at org.apache.lucene.index.IndexWriter.lt;initgt;(IndexWriter.java:1407) at org.apache.solr.update.SolrIndexWriter.lt;initgt;(SolrIndexWriter.java:190) at org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:98) at org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:173) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:220) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1317) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:636) /h1HR size=1 noshade=noshadepbtype/b Status report/ppbmessage/b unull java.nio.channels.OverlappingFileLockException at sun.nio.ch.FileChannelImpl$SharedFileLockTable.checkList(FileChannelImpl.java:1215) at sun.nio.ch.FileChannelImpl$SharedFileLockTable.add(FileChannelImpl.java:1117) at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:923) at java.nio.channels.FileChannel.tryLock(FileChannel.java:978) at org.apache.lucene.store.NativeFSLock.obtain(NativeFSLockFactory.java:233) at org.apache.lucene.store.Lock.obtain(Lock.java:73) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1550) at org.apache.lucene.index.IndexWriter.lt;initgt;(IndexWriter.java:1407) at org.apache.solr.update.SolrIndexWriter.lt;initgt;(SolrIndexWriter.java:190) at org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:98) at org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:173) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:220) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1317) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at
DIH for e-mails
I’m using Data Import Handler for index emails. The problem is that I wanna add my own field such as security_number. Someone have any idea? Regards, -- James Bond Fang
DIH for e-mails
I’m using Data Import Handler for index emails. The problem is that I wanna add my own field such as security_number. Someone have any idea? Regards, -- James Bond Fang
DIH for e-mails
I’m using Data Import Handler for index emails. The problem is that I wanna add my own field such as security_number. Someone have any idea? Regards, Jame Bond Fang
Re: DIH for e-mails
The best way to add your own fields is to create a custom Transformer sub-class. See: http://www.lucidimagination.com/search/out?u=http%3A%2F%2Fwiki.apache.org%2Fsolr%2FDataImportHandler This will guide you through the steps. Peter 2011/5/5 方振鹏 michong900...@xmu.edu.cn: I’m using Data Import Handler for index emails. The problem is that I wanna add my own field such as security_number. Someone have any idea? Regards, Jame Bond Fang
Re: How do i I modify XMLWriter to write foobar?
: $ xmlstarlet sel -t -c /config/queryResponseWriter conf/solrconfig.xml : queryResponseWriter name=xml class=org.apache.solr.request.* : XMLResponseWriter* default=true/ : : Now I comment the line in Solrconfix.xml, and there's no more writer. : $ xmlstarlet sel -t -c /config/queryResponseWriter conf/solrconfig.xml : : I make a query, and the XMLResponseWriter is still in charge. : *$ curl -L http://localhost:8080/solr/select?q=apache* : ?xml version=1.0 encoding=UTF-8? ... Your example request is not specifying a wt param. in addition to the response writers declared in your solrconfig.xml, there are response writers that exist implicitly unless you define your own instances that override those names (xml, json, python, etc...) the real question is: what writer do you *want* to have used when no wt is specified? whatever the answer is: declare n instance of that writer with default=true in your solrconfig.xml -Hoss
Re: Indexing 20M documents from MySQL with DIH
{quote} ... Caused by: java.io.EOFException: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost. at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:2539) at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2989) ... 22 more Apr 21, 2011 3:53:28 AM org.apache.solr.handler.dataimport.EntityProcessorBase getNext SEVERE: getNext() failed for query 'REDACTED' org.apache.solr.handler.dataimport.DataImportHandlerException: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure The last packet successfully received from the server was 128 milliseconds ago. The last packet sent successfully to the server was 25,273,484 milliseconds ago. ... {quote} It could probably be because of autocommit / segment merging. You could try to disable autocommit / increase mergeFactor {quote} I've used sphinx in the past, which uses multiple queries to pull out a subset of records ranged based on PrimaryKey, does Solr offer functionality similar to this? It seems that once a Solr index gets to a certain size, the indexing of a batch takes longer than MySQL's net_write_timeout, so it kills the connection. {quote} I was thinking about some hackish solution to paginate results entity name =pages query=SELECT id FROM generate_series( (SELECT count(*) from source_table) / 1000 ) ... entity name=records query=SELECT * from source_table LIMIT 1000 OFFSET ${pages.id}*1000 /entity /entity Or something along those lines ( you'd need to to calculate offset in pages query ) But unfortunately MySQL does not provide generate_series function (it's postgres function and there'r similar solutions for oracle and mssql). On Mon, Apr 25, 2011 at 3:59 AM, Scott Bigelow eph...@gmail.com wrote: Thank you everyone for your help. I ended up getting the index to work using the exact same config file on a (substantially) larger instance. On Fri, Apr 22, 2011 at 5:46 AM, Erick Erickson erickerick...@gmail.com wrote: {{{A custom indexer, so that's a fairly common practice? So when you are dealing with these large indexes, do you try not to fully rebuild them when you can? It's not a nightly thing, but something to do in case of a disaster? Is there a difference in the performance of an index that was built all at once vs. one that has had delta inserts and updates applied over a period of months?}}} Is it a common practice? Like all of this, it depends. It's certainly easier to let DIH do the work. Sometimes DIH doesn't have all the capabilities necessary. Or as Chris said, in the case where you already have a system built up and it's easier to just grab the output from that and send it to Solr, perhaps with SolrJ and not use DIH. Some people are just more comfortable with their own code... Do you try not to fully rebuild. It depends on how painful a full rebuild is. Some people just like the simplicity of starting over every day/week/month. But you *have* to be able to rebuild your index in case of disaster, and a periodic full rebuild certainly keeps that process up to date. Is there a difference...delta inserts...updates...applied over months. Not if you do an optimize. When a document is deleted (or updated), it's only marked as deleted. The associated data is still in the index. Optimize will reclaim that space and compact the segments, perhaps down to one. But there's no real operational difference between a newly-rebuilt index and one that's been optimized. If you don't delete/update, there's not much reason to optimize either I'll leave the DIH to others.. Best Erick On Thu, Apr 21, 2011 at 8:09 PM, Scott Bigelow eph...@gmail.com wrote: Thanks for the e-mail. I probably should have provided more details, but I was more interested in making sure I was approaching the problem correctly (using DIH, with one big SELECT statement for millions of rows) instead of solving this specific problem. Here's a partial stacktrace from this specific problem: ... Caused by: java.io.EOFException: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost. at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:2539) at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2989) ... 22 more Apr 21, 2011 3:53:28 AM org.apache.solr.handler.dataimport.EntityProcessorBase getNext SEVERE: getNext() failed for query 'REDACTED' org.apache.solr.handler.dataimport.DataImportHandlerException: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure The last packet successfully received from the server was 128 milliseconds ago. The last packet sent successfully to the server was 25,273,484 milliseconds ago. ... A custom indexer, so that's a fairly common practice? So when you are dealing with these large indexes, do you try not to fully rebuild them when you can? It's not a
RE: Is it possible to build Solr as a maven project?
You're welcome, I'm glad you got it to work. - Steve -Original Message- From: Gabriele Kahlout [mailto:gabri...@mysimpatico.com] Sent: Thursday, May 05, 2011 2:41 PM To: solr-user@lucene.apache.org Subject: Re: Is it possible to build Solr as a maven project? Steven, thank you! $ mvn -DskipTests=true install works! [INFO] Reactor Summary: [INFO] [INFO] Grandparent POM for Apache Lucene Java and Apache Solr SUCCESS [13.142s] [INFO] Lucene parent POM . SUCCESS [0.345s] [INFO] Lucene Core ... SUCCESS [18.448s] [INFO] Lucene Test Framework . SUCCESS [3.560s] [INFO] Lucene Common Analyzers ... SUCCESS [7.739s] [INFO] Lucene Contrib Ant SUCCESS [1.265s] [INFO] Lucene Contrib bdb SUCCESS [1.332s] [INFO] Lucene Contrib bdb-je . SUCCESS [1.321s] [INFO] Lucene Database aggregator POM SUCCESS [0.242s] [INFO] Lucene Demo ... SUCCESS [1.813s] [INFO] Lucene Memory . SUCCESS [2.412s] [INFO] Lucene Queries SUCCESS [2.275s] [INFO] Lucene Highlighter SUCCESS [2.985s] [INFO] Lucene InstantiatedIndex .. SUCCESS [2.170s] [INFO] Lucene Lucli .. SUCCESS [1.814s] [INFO] Lucene Miscellaneous .. SUCCESS [1.998s] [INFO] Lucene Query Parser ... SUCCESS [2.755s] [INFO] Lucene Spatial SUCCESS [1.314s] [INFO] Lucene Spellchecker ... SUCCESS [1.535s] [INFO] Lucene Swing .. SUCCESS [1.233s] [INFO] Lucene Wordnet SUCCESS [1.309s] [INFO] Lucene XML Query Parser ... SUCCESS [1.483s] [INFO] Lucene Contrib aggregator POM . SUCCESS [0.151s] [INFO] Lucene ICU Analysis Components SUCCESS [2.728s] [INFO] Lucene Phonetic Filters ... SUCCESS [1.765s] [INFO] Lucene Smart Chinese Analyzer . SUCCESS [3.709s] [INFO] Lucene Stempel Analyzer ... SUCCESS [4.241s] [INFO] Lucene Analysis Modules aggregator POM SUCCESS [0.213s] [INFO] Lucene Benchmark .. SUCCESS [2.926s] [INFO] Lucene Modules aggregator POM . SUCCESS [0.307s] [INFO] Apache Solr parent POM SUCCESS [0.233s] [INFO] Apache Solr Solrj . SUCCESS [3.780s] [INFO] Apache Solr Core .. SUCCESS [9.693s] [INFO] Apache Solr Search Server . SUCCESS [6.739s] [INFO] Apache Solr Test Framework SUCCESS [2.699s] [INFO] Apache Solr Analysis Extras ... SUCCESS [3.868s] [INFO] Apache Solr Clustering SUCCESS [6.736s] [INFO] Apache Solr DataImportHandler . SUCCESS [4.914s] [INFO] Apache Solr DataImportHandler Extras .. SUCCESS [2.721s] [INFO] Apache Solr DataImportHandler aggregator POM .. SUCCESS [0.253s] [INFO] Apache Solr Content Extraction Library SUCCESS [1.909s] [INFO] Apache Solr - UIMA integration SUCCESS [1.922s] [INFO] Apache Solr Contrib aggregator POM SUCCESS [0.211s] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 2:18.040s [INFO] Finished at: Thu May 05 20:39:09 CEST 2011 [INFO] Final Memory: 38M/90M [INFO] On Thu, May 5, 2011 at 6:53 PM, Steven A Rowe sar...@syr.edu wrote: Hi Gabriele, On 5/5/2011 at 9:57 AM, Gabriele Kahlout wrote: Okay, that sequence worked, but then shouldn't I be able to do $ mvn install afterwards? This is what I get: ... COMPILATION ERROR : - org/apache/solr/spelling/suggest/fst/InputStreamDataInput.java:[7,27] package com.google.common.io does not exist org/apache/solr/spelling/suggest/fst/FSTLookup.java:[28,32] package com.google.common.collect does not exist ... mvn install should work, but it doesn't - I can reproduce this error on my machine. This is a bug in the Maven build. The nightly Lucene/Solr Maven build on Jenkins should have caught this compilation failure three weeks ago, when Dawid Weiss committed his work under
Re: SpellCheckComponent issue
Hi Sid, unfortunately not and as far as I know it is not possible to realize your requirements with Solr's SpellCheck-Packages (I talk about V. 1.4, since there are some changes in 3.1). Regards, Em -- View this message in context: http://lucene.472066.n3.nabble.com/SpellCheckComponent-issue-tp2903926p2904839.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr: org.apache.solr.common.SolrException: Invalid Date String:
Hi, I am new to solr and this is my first attempt at indexing solr data, I am getting the following exception while indexing, org.apache.solr.common.SolrException: Invalid Date String:'2011-01-07' at org.apache.solr.schema.DateField.parseMath(DateField.java:165) at org.apache.solr.schema.TrieDateField.createField(TrieDateField.java:169) at org.apache.solr.schema.SchemaField.createField(SchemaField.java:98) at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:204) at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:277) I understand from reading some articles that Solr stores time only in UTC, this is the query i am trying to index, Select id,text,'language',links,tweetType,source,location, bio,url,utcOffset,timeZone,frenCnt,createdAt,createdOnGMT,createdOnServerTim e,follCnt,favCnt,totStatusCnt,usrCrtDate,humanSentiment,replied,replyMsg,cla ssified,locationDetail, geonameid,country,continent,placeLongitude,placeLatitude,listedCnt,hashtag,m entions,senderInfScr, createdOnGMTDate,DATE_FORMAT(CONVERT_TZ(createdOnGMTDate,'+00:00','+05:30'), '%Y-%m-%d') as IST,DATE_FORMAT(CONVERT_TZ(createdOnGMTDate,'+00:00','+01:00'),'%Y-%m-%d') as ECT,DATE_FORMAT(CONVERT_TZ(createdOnGMTDate,'+00:00','+02:00'),'%Y-%m-%d') as EET,DATE_FORMAT(CONVERT_TZ(createdOnGMTDate,'+00:00','+03:30'),'%Y-%m-%d') as MET,sign(classified) as sentiment from Why i am doing this timezone conversion is because i need to group results by the user timezone. How can i achieve this? Regards, Rohit
Re: How do i I modify XMLWriter to write foobar?
I've now tried to write my own QueryResponseWriter plugin[1], as a maven project depending on Solr Core 3.1, which is the same version of Solr I've installed. It seems I'm not able to get rid of some cache. $ xmlstarlet sel -t -c /config/queryResponseWriter conf/solrconfig.xml queryResponseWriter name=*xml* class=org.apache.solr.request.* XMLResponseWriter*/ queryResponseWriter name=*Test* class=com.mysimpatico.me.indexplugins.* TestQueryResponseWriter* default=true/ Restarted tomcat after changing solrconfig.xml and placing indexplugins.jar in $SOLR_HOME/ At tomcat boot: INFO: Adding 'file:/Users/simpatico/SOLR_HOME/lib/IndexPlugins.jar' to classloader I get legacy code of the plugin for both, and I don't understand why. At least the xml should be different. Why could this be? How to find out? http://localhost:8080/solr/select?q=apachewt=Test and http://localhost:8080/solr/select?q=apachewt=xml XML Parsing Error: syntax error Location: http://localhost:8080/solr/select?q=apachewt=xml (//Test Line Number 1, Column 1: foobarresponseHeaderstatusQTimeparamsqapachewtxmlresponse00foobar ^ It seems the new code for TestQueryResponseWriter[1] seems to never be executed since i added a severe log statement that doesn't appear in tomcat logs. Where are those caches? Thank you in advance. [1] package com.mysimpatico.me.indexplugins; import java.io.*; import java.util.logging.Level; import java.util.logging.Logger; import org.apache.solr.request.XMLResponseWriter; /** * Hello world! * */ public class TestQueryResponseWriter extends XMLResponseWriter{ @Override public void write(Writer writer, org.apache.solr.request.SolrQueryRequest request, org.apache.solr.response.SolrQueryResponse response) throws IOException { Logger.getLogger(TestQueryResponseWriter.class.getName()).log(Level.SEVERE, Hello from TestQueryResponseWriter); super.write(writer, request, response); } } On Thu, May 5, 2011 at 9:01 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : $ xmlstarlet sel -t -c /config/queryResponseWriter conf/solrconfig.xml : queryResponseWriter name=xml class=org.apache.solr.request.* : XMLResponseWriter* default=true/ : : Now I comment the line in Solrconfix.xml, and there's no more writer. : $ xmlstarlet sel -t -c /config/queryResponseWriter conf/solrconfig.xml : : I make a query, and the XMLResponseWriter is still in charge. : *$ curl -L http://localhost:8080/solr/select?q=apache* : ?xml version=1.0 encoding=UTF-8? ... Your example request is not specifying a wt param. in addition to the response writers declared in your solrconfig.xml, there are response writers that exist implicitly unless you define your own instances that override those names (xml, json, python, etc...) the real question is: what writer do you *want* to have used when no wt is specified? whatever the answer is: declare n instance of that writer with default=true in your solrconfig.xml -Hoss -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: Is it possible to build Solr as a maven project?
Just for the reference. $ svn update At revision 1099940. On Thu, May 5, 2011 at 9:14 PM, Steven A Rowe sar...@syr.edu wrote: You're welcome, I'm glad you got it to work. - Steve -Original Message- From: Gabriele Kahlout [mailto:gabri...@mysimpatico.com] Sent: Thursday, May 05, 2011 2:41 PM To: solr-user@lucene.apache.org Subject: Re: Is it possible to build Solr as a maven project? Steven, thank you! $ mvn -DskipTests=true install works! [INFO] Reactor Summary: [INFO] [INFO] Grandparent POM for Apache Lucene Java and Apache Solr SUCCESS [13.142s] [INFO] Lucene parent POM . SUCCESS [0.345s] [INFO] Lucene Core ... SUCCESS [18.448s] [INFO] Lucene Test Framework . SUCCESS [3.560s] [INFO] Lucene Common Analyzers ... SUCCESS [7.739s] [INFO] Lucene Contrib Ant SUCCESS [1.265s] [INFO] Lucene Contrib bdb SUCCESS [1.332s] [INFO] Lucene Contrib bdb-je . SUCCESS [1.321s] [INFO] Lucene Database aggregator POM SUCCESS [0.242s] [INFO] Lucene Demo ... SUCCESS [1.813s] [INFO] Lucene Memory . SUCCESS [2.412s] [INFO] Lucene Queries SUCCESS [2.275s] [INFO] Lucene Highlighter SUCCESS [2.985s] [INFO] Lucene InstantiatedIndex .. SUCCESS [2.170s] [INFO] Lucene Lucli .. SUCCESS [1.814s] [INFO] Lucene Miscellaneous .. SUCCESS [1.998s] [INFO] Lucene Query Parser ... SUCCESS [2.755s] [INFO] Lucene Spatial SUCCESS [1.314s] [INFO] Lucene Spellchecker ... SUCCESS [1.535s] [INFO] Lucene Swing .. SUCCESS [1.233s] [INFO] Lucene Wordnet SUCCESS [1.309s] [INFO] Lucene XML Query Parser ... SUCCESS [1.483s] [INFO] Lucene Contrib aggregator POM . SUCCESS [0.151s] [INFO] Lucene ICU Analysis Components SUCCESS [2.728s] [INFO] Lucene Phonetic Filters ... SUCCESS [1.765s] [INFO] Lucene Smart Chinese Analyzer . SUCCESS [3.709s] [INFO] Lucene Stempel Analyzer ... SUCCESS [4.241s] [INFO] Lucene Analysis Modules aggregator POM SUCCESS [0.213s] [INFO] Lucene Benchmark .. SUCCESS [2.926s] [INFO] Lucene Modules aggregator POM . SUCCESS [0.307s] [INFO] Apache Solr parent POM SUCCESS [0.233s] [INFO] Apache Solr Solrj . SUCCESS [3.780s] [INFO] Apache Solr Core .. SUCCESS [9.693s] [INFO] Apache Solr Search Server . SUCCESS [6.739s] [INFO] Apache Solr Test Framework SUCCESS [2.699s] [INFO] Apache Solr Analysis Extras ... SUCCESS [3.868s] [INFO] Apache Solr Clustering SUCCESS [6.736s] [INFO] Apache Solr DataImportHandler . SUCCESS [4.914s] [INFO] Apache Solr DataImportHandler Extras .. SUCCESS [2.721s] [INFO] Apache Solr DataImportHandler aggregator POM .. SUCCESS [0.253s] [INFO] Apache Solr Content Extraction Library SUCCESS [1.909s] [INFO] Apache Solr - UIMA integration SUCCESS [1.922s] [INFO] Apache Solr Contrib aggregator POM SUCCESS [0.211s] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 2:18.040s [INFO] Finished at: Thu May 05 20:39:09 CEST 2011 [INFO] Final Memory: 38M/90M [INFO] On Thu, May 5, 2011 at 6:53 PM, Steven A Rowe sar...@syr.edu wrote: Hi Gabriele, On 5/5/2011 at 9:57 AM, Gabriele Kahlout wrote: Okay, that sequence worked, but then shouldn't I be able to do $ mvn install afterwards? This is what I get: ... COMPILATION ERROR : - org/apache/solr/spelling/suggest/fst/InputStreamDataInput.java:[7,27] package com.google.common.io does not exist org/apache/solr/spelling/suggest/fst/FSTLookup.java:[28,32] package com.google.common.collect does not exist ... mvn install should work, but it doesn't
Custom sorting based on external (database) data
Hi, Sorry for the possible double post, I wrote this up but had the incorrect sender address, so I am guessing that my previous one is going to be rejected by the list moderation daemon. I am trying to figure out options for the following problem. I am on Solr 1.4.1 (Lucene 2.9.1). I have search results which are going to be ranked by the user (using a thumbs up/down) and would translate to a score between -1 and +1. This data is stored in a database table ( unique_id thumbs_up thumbs_down num_calls as the thumbs up/down component is clicked. We want to be able to sort the results by the following score = (thumbs_up - thumbs_down) / (num_calls). The unique_id field refers to the one referenced as uniqueId in the schema.xml. Based on the following conversation: http://www.mail-archive.com/solr-user@lucene.apache.org/msg06322.html ...my understanding is that I need to: 1) subclass FieldType to create my own RankFieldType. 2) In this class I override the getSortField() method to return my custom FieldSortComparatorSource object. 3) Build the custom FieldSortComparatorSource object which returns a custom FieldSortComparator object in newComparator(). 4) Configure the field type of class RankFieldType (rank_t), and a field (called rank) of field type rank_t in schema.xml of type RankFieldType. 5) use sort=rank+desc to do the sort. My question is: is there a simpler/more performant way? The number of database lookups seems like its going to be pretty high with this approach. And its hard to believe that my problem is new, so I am guessing this is either part of some Solr configuration I am missing, or there is some other (possibly simpler) approach I am overlooking. Pointers to documentation or code (or even keywords I could google) would be much appreciated. TIA for all your help, Sujit
Re: Custom sorting based on external (database) data
--- On Thu, 5/5/11, Sujit Pal sujit@comcast.net wrote: From: Sujit Pal sujit@comcast.net Subject: Custom sorting based on external (database) data To: solr-user solr-user@lucene.apache.org Date: Thursday, May 5, 2011, 11:03 PM Hi, Sorry for the possible double post, I wrote this up but had the incorrect sender address, so I am guessing that my previous one is going to be rejected by the list moderation daemon. I am trying to figure out options for the following problem. I am on Solr 1.4.1 (Lucene 2.9.1). I have search results which are going to be ranked by the user (using a thumbs up/down) and would translate to a score between -1 and +1. This data is stored in a database table ( unique_id thumbs_up thumbs_down num_calls as the thumbs up/down component is clicked. We want to be able to sort the results by the following score = (thumbs_up - thumbs_down) / (num_calls). The unique_id field refers to the one referenced as uniqueId in the schema.xml. Based on the following conversation: http://www.mail-archive.com/solr-user@lucene.apache.org/msg06322.html ...my understanding is that I need to: 1) subclass FieldType to create my own RankFieldType. 2) In this class I override the getSortField() method to return my custom FieldSortComparatorSource object. 3) Build the custom FieldSortComparatorSource object which returns a custom FieldSortComparator object in newComparator(). 4) Configure the field type of class RankFieldType (rank_t), and a field (called rank) of field type rank_t in schema.xml of type RankFieldType. 5) use sort=rank+desc to do the sort. My question is: is there a simpler/more performant way? The number of database lookups seems like its going to be pretty high with this approach. And its hard to believe that my problem is new, so I am guessing this is either part of some Solr configuration I am missing, or there is some other (possibly simpler) approach I am overlooking. Pointers to documentation or code (or even keywords I could google) would be much appreciated. Looks like it can be done with http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html and http://wiki.apache.org/solr/FunctionQuery You can dump your table into three text files. Issue a commit to load these changes. Sort by function query is available in Solr3.1 though.
Re: Custom sorting based on external (database) data
Thank you Ahmet, looks like we could use this. Basically we would do periodic dumps of the (unique_id|computed_score) sorted by score and write it out to this file followed by a commit. Found some more info here, for the benefit of others looking for something similar: http://dev.tailsweep.com/solr-external-scoring/ On Thu, 2011-05-05 at 13:12 -0700, Ahmet Arslan wrote: --- On Thu, 5/5/11, Sujit Pal sujit@comcast.net wrote: From: Sujit Pal sujit@comcast.net Subject: Custom sorting based on external (database) data To: solr-user solr-user@lucene.apache.org Date: Thursday, May 5, 2011, 11:03 PM Hi, Sorry for the possible double post, I wrote this up but had the incorrect sender address, so I am guessing that my previous one is going to be rejected by the list moderation daemon. I am trying to figure out options for the following problem. I am on Solr 1.4.1 (Lucene 2.9.1). I have search results which are going to be ranked by the user (using a thumbs up/down) and would translate to a score between -1 and +1. This data is stored in a database table ( unique_id thumbs_up thumbs_down num_calls as the thumbs up/down component is clicked. We want to be able to sort the results by the following score = (thumbs_up - thumbs_down) / (num_calls). The unique_id field refers to the one referenced as uniqueId in the schema.xml. Based on the following conversation: http://www.mail-archive.com/solr-user@lucene.apache.org/msg06322.html ...my understanding is that I need to: 1) subclass FieldType to create my own RankFieldType. 2) In this class I override the getSortField() method to return my custom FieldSortComparatorSource object. 3) Build the custom FieldSortComparatorSource object which returns a custom FieldSortComparator object in newComparator(). 4) Configure the field type of class RankFieldType (rank_t), and a field (called rank) of field type rank_t in schema.xml of type RankFieldType. 5) use sort=rank+desc to do the sort. My question is: is there a simpler/more performant way? The number of database lookups seems like its going to be pretty high with this approach. And its hard to believe that my problem is new, so I am guessing this is either part of some Solr configuration I am missing, or there is some other (possibly simpler) approach I am overlooking. Pointers to documentation or code (or even keywords I could google) would be much appreciated. Looks like it can be done with http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html and http://wiki.apache.org/solr/FunctionQuery You can dump your table into three text files. Issue a commit to load these changes. Sort by function query is available in Solr3.1 though.
force 0 results from within a search component?
Hi guys, another question on custom search components: Is there any way to force the response to be 0 results from within a search component (and break out of the component chain)? I'm doing some checks in my first-component and in some cases would like to stop processing the request and just pretend, that there are 0 results ... Thanks, Fred.
Re: fast case-insensitive autocomplete
Hi, I haven't used Suggester yet, but couldn't you feed it all lowercase content and then lowercase whatever the user is typing before sending it to Suggester to avoid case mismatch? Autocomplete on http://search-lucene.com/ uses http://sematext.com/products/autocomplete/index.html if you want a shortcut. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Kusenda, Brandyn J brandyn-kuse...@uiowa.edu To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Thu, May 5, 2011 9:22:03 AM Subject: fast case-insensitive autocomplete Hi. I need an autocomplete solution to handle case-insensitive queries but return the original text with the case still intact. I've experimented with both the Suggester and TermComponent methods. TermComponent is working when I use the regex option, however, it is far to slow. I get the speed i want by using term.prefix for by using the suggester but it's case sensitive. Here is an example operating on a user directory: Query: bran Results: Branden Smith, Brandon Thompson, Brandon Verner, Brandy Finny, Brian Smith, ... A solution that I would expect to work would be to store two fields; one containing the original text and the other containing the lowercase. Then convert the query to lower case and run the query against the lower case field and return the original (case preserved) field. Unfortunately, I can't get a TermComponent query to return additional fields. It only returns the field it's searching against. Should this work or can I only return additional fields for standard queries. Thanks in advance, Brandyn
Re: force 0 results from within a search component?
Is there any way to force the response to be 0 results from within a search component (and break out of the component chain)? I'm doing some checks in my first-component and in some cases would like to stop processing the request and just pretend, that there are 0 results ... Yes. You can disable all underlying components by their parameters. setParam(query,false); setParam(facet,false); setParam(hl,false); etc..
Re: why query chinese character with bracket become phrase query by default?
Nice, it works like a charm. I am using solr 1.4.1. Here is my configuration for the chinese field: fieldType name=text_ch class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.ChineseTokenizerFactory/ /analyzer analyzer type=query tokenizer class=solr.ChineseTokenizerFactory/ filter class=solr.PositionFilterFactory/ /analyzer /fieldType Now when I get the expected hassle free parsing on solr side: lst name=debug str name=rawquerystringtitle_zh_CN:(我活)/str str name=querystringtitle_zh_CN:(我活)/str str name=parsedquerytitle_zh_CN:我 title_zh_CN:活/str str name=parsedquery_toStringtitle_zh_CN:我 title_zh_CN:活/str -- View this message in context: http://lucene.472066.n3.nabble.com/why-query-chinese-character-with-bracket-become-phrase-query-by-default-tp2901542p2905784.html Sent from the Solr - User mailing list archive at Nabble.com.
Thoughts on Search Analytics?
Hi, I'd like to solicit your thoughts about Search Analytics if you are doing any sort of analysis/reporting of search logs or click stream or anything related. * Which information or reports do you find the most useful and why? * Which reports would you like to have, but don't have for whatever reason (don't have the needed data, or it's too hard to produce such reports, or ...) * Which tool(s) or service(s) do you use and find the most useful? I'm preparing a presentation on the topic of Search Analytics, so I'm trying to solicit opinions, practices, desires, etc. on this topic. Your thoughts would be greatly appreciated. If you could reply directly, that would be great, since this may be a bit OT for the list. Thanks! Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/
Testing the limits of non-Java Solr
What's the probability that I can build a non-trivial Solr app without writing any Java? I've been planning to use Solr, Lucene, and existing plug-ins, and sort of hoping not to write any Java (the app itself is Ruby / Rails). The dox (such as http://wiki.apache.org/solr/FAQ) seem encouraging. [I *can* write Java, but my planning's all been no Java.] I'm just beginning the design work in earnest, and I suddenly notice that it seems every mail thread, blog, or example starts out Java-free, but somehow ends up involving Java code. I'm not sure I yet understand all these snippets; conceivably some of the Java I see could just as easily be written in another language, but it makes me wonder. Is it realistic to plan a sizable Solr application without some Java programming? I know, I know, I know: everything depends on the details. I'd be interested even in anecdotes: has anyone ever achieved this before? Also, what are the clues I should look for that I need to step into the Java realm? I understand, for example, that it's possible to write filters and tokenizers to do stuff not available in any standard one; in this case, the clue would be I can't find what I want in the standard list, I guess. Are there other things I should look for? -==- Jack Repenning Technologist Codesion Business Unit CollabNet, Inc. 8000 Marina Boulevard, Suite 600 Brisbane, California 94005 office: +1 650.228.2562 twitter: http://twitter.com/jrep PGP.sig Description: This is a digitally signed message part
Re: Solr: org.apache.solr.common.SolrException: Invalid Date String:
org.apache.solr.common.SolrException: Invalid Date String:'2011-01-07' at org.apache.solr.schema.DateField.parseMath(DateField.java:165) Solr accepts date in the following format: 2011-01-07T00:00:00Z I understand from reading some articles that Solr stores time only in UTC, this is the query i am trying to index, It seems that you are fetching data from a Relational Database. You may consider using http://wiki.apache.org/solr/DataImportHandler Why i am doing this timezone conversion is because i need to group results by the user timezone. How can i achieve this? Save timezone info in a field and facet on that field? http://wiki.apache.org/solr/SimpleFacetParameters
Re: Testing the limits of non-Java Solr
Short answer: Yes, you can deploy a Solr cluster and write an application that talks to it without writing any Java (but it may be PHP or Python or unless that application is you typing telnet my-solr-server 8983 ) Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Jack Repenning jrepenn...@collab.net To: solr-user@lucene.apache.org Sent: Thu, May 5, 2011 6:28:31 PM Subject: Testing the limits of non-Java Solr What's the probability that I can build a non-trivial Solr app without writing any Java? I've been planning to use Solr, Lucene, and existing plug-ins, and sort of hoping not to write any Java (the app itself is Ruby / Rails). The dox (such as http://wiki.apache.org/solr/FAQ) seem encouraging. [I *can* write Java, but my planning's all been no Java.] I'm just beginning the design work in earnest, and I suddenly notice that it seems every mail thread, blog, or example starts out Java-free, but somehow ends up involving Java code. I'm not sure I yet understand all these snippets; conceivably some of the Java I see could just as easily be written in another language, but it makes me wonder. Is it realistic to plan a sizable Solr application without some Java programming? I know, I know, I know: everything depends on the details. I'd be interested even in anecdotes: has anyone ever achieved this before? Also, what are the clues I should look for that I need to step into the Java realm? I understand, for example, that it's possible to write filters and tokenizers to do stuff not available in any standard one; in this case, the clue would be I can't find what I want in the standard list, I guess. Are there other things I should look for? -==- Jack Repenning Technologist Codesion Business Unit CollabNet, Inc. 8000 Marina Boulevard, Suite 600 Brisbane, California 94005 office: +1 650.228.2562 twitter: http://twitter.com/jrep
Re: Thoughts on Search Analytics?
When I ran the search engine at Feedster, I wrote a perl script that ran nightly and gave me: total number of searches total number of searches per hour N most frequent searches max time for a search min time for a search mean time for searches median time for searches N slowest searches warnings errors all the above per index (core in SOLR) The script generated a text file (for me) and an Excel spreadsheet (for the management) François On May 5, 2011, at 6:25 PM, Otis Gospodnetic wrote: Hi, I'd like to solicit your thoughts about Search Analytics if you are doing any sort of analysis/reporting of search logs or click stream or anything related. * Which information or reports do you find the most useful and why? * Which reports would you like to have, but don't have for whatever reason (don't have the needed data, or it's too hard to produce such reports, or ...) * Which tool(s) or service(s) do you use and find the most useful? I'm preparing a presentation on the topic of Search Analytics, so I'm trying to solicit opinions, practices, desires, etc. on this topic. Your thoughts would be greatly appreciated. If you could reply directly, that would be great, since this may be a bit OT for the list. Thanks! Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/
RE: Solr: org.apache.solr.common.SolrException: Invalid Date String:
Rohit, The solr server using TrieDateField must receive values in the format 2011-01-07T17:00:30Z This should be a UTC-based datetime. The offset can be applied once you get your results back from solr SimpleDateFormat df = new SimpleDateFormat(format); df.setTimeZone(TimeZone.getTimeZone(IST)); java.util.Date dateunix = df.parse(datetime); -Craig -Original Message- From: Rohit [mailto:ro...@in-rev.com] Sent: Friday, 6 May 2011 2:31 AM To: solr-user@lucene.apache.org Subject: Solr: org.apache.solr.common.SolrException: Invalid Date String: Hi, I am new to solr and this is my first attempt at indexing solr data, I am getting the following exception while indexing, org.apache.solr.common.SolrException: Invalid Date String:'2011-01-07' at org.apache.solr.schema.DateField.parseMath(DateField.java:165) at org.apache.solr.schema.TrieDateField.createField(TrieDateField.java:169) at org.apache.solr.schema.SchemaField.createField(SchemaField.java:98) at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:204) at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:277) I understand from reading some articles that Solr stores time only in UTC, this is the query i am trying to index, Select id,text,'language',links,tweetType,source,location, bio,url,utcOffset,timeZone,frenCnt,createdAt,createdOnGMT,createdOnServerTim e,follCnt,favCnt,totStatusCnt,usrCrtDate,humanSentiment,replied,replyMsg,cla ssified,locationDetail, geonameid,country,continent,placeLongitude,placeLatitude,listedCnt,hashtag,m entions,senderInfScr, createdOnGMTDate,DATE_FORMAT(CONVERT_TZ(createdOnGMTDate,'+00:00','+05:30'), '%Y-%m-%d') as IST,DATE_FORMAT(CONVERT_TZ(createdOnGMTDate,'+00:00','+01:00'),'%Y-%m-%d') as ECT,DATE_FORMAT(CONVERT_TZ(createdOnGMTDate,'+00:00','+02:00'),'%Y-%m-%d') as EET,DATE_FORMAT(CONVERT_TZ(createdOnGMTDate,'+00:00','+03:30'),'%Y-%m-%d') as MET,sign(classified) as sentiment from Why i am doing this timezone conversion is because i need to group results by the user timezone. How can i achieve this? Regards, Rohit
Re: Solr Terms and Date field issues
H, this is puzzling. If you could come up with a couple of xml files and a schema that illustrate this, I'll see what I can see... Thanks, Erick On Wed, May 4, 2011 at 7:05 PM, Viswa S svis...@hotmail.com wrote: Erik, I suspected the same, and setup a test instance to reproduce this. The date field I used is setup to capture indexing time, in other words the schema has a default value of NOW. However, I have reproduced this issue with fields which do no have defaults too. On the second one, I did a delete-commit (with expungeDeletes=true) and then a optimize. All other fields show updated terms except the date fields. I have also double checked to see if the Luke handler has any different terms, and it did not. Thanks Viswa Date: Wed, 4 May 2011 08:17:39 -0400 Subject: Re: Solr Terms and Date field issues From: erickerick...@gmail.com To: solr-user@lucene.apache.org Hmmm, this *looks* like you've changed your schema without re-indexing all your data so you're getting old (string?) values in that field, but that's just a guess. If this is really happening on a clean index it's a problem. I'm also going to guess that you're not really deleting the documents you think. Are you committing after the deletes? Best Erick On Wed, May 4, 2011 at 2:18 AM, Viswa S svis...@hotmail.com wrote: Hello, The terms query for a date field seems to get populated with some weird dates, many of these dates (1970,2009,2011-04-23) are not present in the indexed data. Please see sample data below I also notice that a delete and optimize does not remove the relevant terms for date fields, the string fields seems work fine. Thanks Viswa Results from Terms component: int name=2011-05-04T02:01:32.928Z3479/int int name=2011-05-04T02:00:19.2Z3479/int int name=2011-05-03T22:34:58.432Z3479/int int name=2011-04-23T01:36:14.336Z3479/int int name=2009-03-13T13:23:01.248Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=2011-05-04T02:01:34.592Z265/int Result from facet component, rounded by seconds.: lst name=InsertTime int name=2011-05-04T02:01:32Z1/int int name=2011-05-04T02:01:33Z1148/int int name=2011-05-04T02:01:34Z2333/int str name=gap+1SECOND/str date name=start2011-05-03T06:14:14Z/date date name=end2011-05-04T06:14:14Z/date/lst
Re: Is it possible to use sub-fields or multivalued fields for boosting?
For a truly universal field, I'm not at all sure how you'd proceed. But if you know what your sub-fields are in advance, have you considered just making them regular fields and them throwing (d)dismax at it? Best Erick On Wed, May 4, 2011 at 11:51 PM, deniz denizdurmu...@gmail.com wrote: okay... let me make the situation more clear... I am trying to create an universal field which includes information about users like firstname, surname, gender, location etc. When I enter something e.g London, I would like to match any users having 'London' in any field firstname, surname or location. But if it matches name or surname, I would like to give a higher weight. so my question is... is it possible to have sub-fields? like field name=universal field name=firstnameblabla/field field name=surnameblabla/field field name=genderblabla/field field name=locationblabla/field /field or any other ideas for implementing such feature? -- View this message in context: http://lucene.472066.n3.nabble.com/Is-it-possible-to-use-sub-fields-or-multivalued-fields-for-boosting-tp2901992p2901992.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Field names with a period (.)
I remember the same, except I think I've seen the recommendation that you make all the letters lower-case. As I remember, there are some interesting edge cases that you might run into later with upper case. But I can't remember the specifics either Erick On Thu, May 5, 2011 at 10:08 AM, Leonardo Souza leonardo...@gmail.com wrote: Thanks Gora! [ ]'s Leonardo da S. Souza °v° Linux user #375225 /(_)\ http://counter.li.org/ ^ ^ On Thu, May 5, 2011 at 3:09 AM, Gora Mohanty g...@mimirtech.com wrote: On Thu, May 5, 2011 at 5:08 AM, Leonardo Souza leonardo...@gmail.com wrote: Hi guys, Can i have a field name with a period(.) ? Like in *file.size* Cannot find now where this is documented, but from what I remember it is recommended to use only characters A-Z, a-z, 0-9, and underscore (_) in field names, and some special characters are known to cause problems. Regards, Gora
Solr 3.1 returning entire highlighted field
Hi, After upgrading from Solr 1.4.0 to 3.1, are highlighting has gone from highlighting short pieces of text to displaying what appears to be the entire contents of the highlighted field. The request using solrj is setting the following: params.setHighlight(true); params.setHighlightSnippets(3); params.set(hl.fl, content_highlight); From solrconfig requestHandler name=dismax class=solr.SearchHandler lst name=defaults str name=defTypedismax/str !-- Use the regex highlight fragmenter because it seems to return better results. -- str name=f.text.hl.fragmenterregex/str /lst arr name=last-components strspellcheck/str /arr /requestHandler highlighting !-- Configure the standard fragmenter -- !-- This could most likely be commented out in the default case -- fragmenter name=gap class=org.apache.solr.highlight.GapFragmenter default=true lst name=defaults int name=hl.fragsize100/int /lst /fragmenter !-- A regular-expression-based fragmenter (f.i., for sentence extraction) -- fragmenter name=regex class=org.apache.solr.highlight.RegexFragmenter lst name=defaults !-- slightly smaller fragsizes work better because of slop -- int name=hl.fragsize70/int !-- allow 50% slop on fragment sizes -- float name=hl.regex.slop0.5/float !-- a basic sentence pattern -- str name=hl.regex.pattern[-\w ,/\n\']{20,200}/str /lst /fragmenter !-- Configure the standard formatter -- formatter name=html class=org.apache.solr.highlight.HtmlFormatter default=true lst name=defaults str name=hl.simple.pre![CDATA[strong]]/str str name=hl.simple.post![CDATA[/strong]]/str /lst /formatter /highlighting From schema field name=content_highlight type=text_highlight indexed=true stored=true required=false compressed=true termVectors=true termPositions=true/ fieldType name=text_highlight class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType Any pointers anybody can provide would be greatly appreciated. Jake
RE: Solr Terms and Date field issues
Please find attached the schema and some test data (test.xml). Thanks for looking this. Viswa Date: Thu, 5 May 2011 19:08:31 -0400 Subject: Re: Solr Terms and Date field issues From: erickerick...@gmail.com To: solr-user@lucene.apache.org H, this is puzzling. If you could come up with a couple of xml files and a schema that illustrate this, I'll see what I can see... Thanks, Erick On Wed, May 4, 2011 at 7:05 PM, Viswa S svis...@hotmail.com wrote: Erik, I suspected the same, and setup a test instance to reproduce this. The date field I used is setup to capture indexing time, in other words the schema has a default value of NOW. However, I have reproduced this issue with fields which do no have defaults too. On the second one, I did a delete-commit (with expungeDeletes=true) and then a optimize. All other fields show updated terms except the date fields. I have also double checked to see if the Luke handler has any different terms, and it did not. Thanks Viswa Date: Wed, 4 May 2011 08:17:39 -0400 Subject: Re: Solr Terms and Date field issues From: erickerick...@gmail.com To: solr-user@lucene.apache.org Hmmm, this *looks* like you've changed your schema without re-indexing all your data so you're getting old (string?) values in that field, but that's just a guess. If this is really happening on a clean index it's a problem. I'm also going to guess that you're not really deleting the documents you think. Are you committing after the deletes? Best Erick On Wed, May 4, 2011 at 2:18 AM, Viswa S svis...@hotmail.com wrote: Hello, The terms query for a date field seems to get populated with some weird dates, many of these dates (1970,2009,2011-04-23) are not present in the indexed data. Please see sample data below I also notice that a delete and optimize does not remove the relevant terms for date fields, the string fields seems work fine. Thanks Viswa Results from Terms component: int name=2011-05-04T02:01:32.928Z3479/int int name=2011-05-04T02:00:19.2Z3479/int int name=2011-05-03T22:34:58.432Z3479/int int name=2011-04-23T01:36:14.336Z3479/int int name=2009-03-13T13:23:01.248Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=2011-05-04T02:01:34.592Z265/int Result from facet component, rounded by seconds.: lst name=InsertTime int name=2011-05-04T02:01:32Z1/int int name=2011-05-04T02:01:33Z1148/int int name=2011-05-04T02:01:34Z2333/int str name=gap+1SECOND/str date name=start2011-05-03T06:14:14Z/date date name=end2011-05-04T06:14:14Z/date/lst add doc field name=fullTextLogI suspected the same, and setup a test instance to reproduce this/field /doc doc field name=fullTextLogThe date field I used is setup to capture indexing time, in other words the schema has a default value of NOW/field /doc doc field name=fullTextLogHowever, I have reproduced this issue with fields which do not have defaults too./field /doc doc field name=fullTextLog Lorem Ipsum is simply dummy text of the printing and typesetting industry/field /doc doc field name=fullTextLogContrary to popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old./field /doc /add ?xml version=1.0 encoding=UTF-8 ? !-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the License); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -- !-- This is the Solr schema file. This file should be named schema.xml and should be in the conf directory under the solr home (i.e. ./solr/conf/schema.xml by default) or located where the classloader for the Solr webapp can find it. This example schema is the recommended starting point for users. It should be kept correct and concise, usable out-of-the-box. For more information, on how to customize this file, please see http://wiki.apache.org/solr/SchemaXml PERFORMANCE NOTE: this schema includes many optional features and should not be used for benchmarking. To improve
RE: Solr Terms and Date field issues
It is okey to see weird things in admin/schema.jsp or terms component with trie based types. Please see http://search-lucene.com/m/WEfSI1Yi4562/ If you really need terms component, consider using copyField (tdate to string type) Please find attached the schema and some test data (test.xml). Thanks for looking this. Viswa Date: Thu, 5 May 2011 19:08:31 -0400 Subject: Re: Solr Terms and Date field issues From: erickerick...@gmail.com To: solr-user@lucene.apache.org H, this is puzzling. If you could come up with a couple of xml files and a schema that illustrate this, I'll see what I can see... Thanks, Erick On Wed, May 4, 2011 at 7:05 PM, Viswa S svis...@hotmail.com wrote: Erik, I suspected the same, and setup a test instance to reproduce this. The date field I used is setup to capture indexing time, in other words the schema has a default value of NOW. However, I have reproduced this issue with fields which do no have defaults too. On the second one, I did a delete-commit (with expungeDeletes=true) and then a optimize. All other fields show updated terms except the date fields. I have also double checked to see if the Luke handler has any different terms, and it did not. Thanks Viswa Date: Wed, 4 May 2011 08:17:39 -0400 Subject: Re: Solr Terms and Date field issues From: erickerick...@gmail.com To: solr-user@lucene.apache.org Hmmm, this *looks* like you've changed your schema without re-indexing all your data so you're getting old (string?) values in that field, but that's just a guess. If this is really happening on a clean index it's a problem. I'm also going to guess that you're not really deleting the documents you think. Are you committing after the deletes? Best Erick On Wed, May 4, 2011 at 2:18 AM, Viswa S svis...@hotmail.com wrote: Hello, The terms query for a date field seems to get populated with some weird dates, many of these dates (1970,2009,2011-04-23) are not present in the indexed data. Please see sample data below I also notice that a delete and optimize does not remove the relevant terms for date fields, the string fields seems work fine. Thanks Viswa Results from Terms component: int name=2011-05-04T02:01:32.928Z3479/int int name=2011-05-04T02:00:19.2Z3479/int int name=2011-05-03T22:34:58.432Z3479/int int name=2011-04-23T01:36:14.336Z3479/int int name=2009-03-13T13:23:01.248Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=1970-01-01T00:00:00Z3479/int int name=2011-05-04T02:01:34.592Z265/int Result from facet component, rounded by seconds.: lst name=InsertTime int name=2011-05-04T02:01:32Z1/int int name=2011-05-04T02:01:33Z1148/int int name=2011-05-04T02:01:34Z2333/int str name=gap+1SECOND/str date name=start2011-05-03T06:14:14Z/date date name=end2011-05-04T06:14:14Z/date/lst
Re: Indexing 20M documents from MySQL with DIH
I am running into this problem as well, but only sporadically, and only in my 3.1 test environment, not 1.4.1 production. I may have narrowed things down, I am interested now in learning whether this is a problem with the MySQL connector or DIH. On 4/21/2011 6:09 PM, Scott Bigelow wrote: Thanks for the e-mail. I probably should have provided more details, but I was more interested in making sure I was approaching the problem correctly (using DIH, with one big SELECT statement for millions of rows) instead of solving this specific problem. Here's a partial stacktrace from this specific problem: ... Caused by: java.io.EOFException: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost. at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:2539) at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2989) ... 22 more Apr 21, 2011 3:53:28 AM org.apache.solr.handler.dataimport.EntityProcessorBase getNext SEVERE: getNext() failed for query 'REDACTED' org.apache.solr.handler.dataimport.DataImportHandlerException: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure The last packet successfully received from the server was 128 milliseconds ago. The last packet sent successfully to the server was 25,273,484 milliseconds ago. ...
Re: Indexing 20M documents from MySQL with DIH
Alex, thanks for your response. I suspect you're right about autoCommit; i ended up solving the problem by merely moving the entire Solr install, untouched, to a significantly larger instance (EC2 m1.small to m1.large). I think it is appropriately sized now for the quantity and intensity of queries that will be thrown at it when it enters production, so I never bothered to get it working on the smaller instance. Your entity examples are interesting, I wonder if you could create some count table to make up for MySQL's lack of row generator. Either way, it seems like paging through results would be a must-have for any enterprise-level indexer, and I'm surprised to find it missing in Solr. When relying on the delta import mechanism for updates, it's not like one would need the consistency of pulling the entire record set as a single, isolated query, since the delta import is designed to fetch new documents and merge them in to a slightly out-of-date/inconsistent index. On Thu, May 5, 2011 at 12:10 PM, Alexey Serba ase...@gmail.com wrote: {quote} ... Caused by: java.io.EOFException: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost. at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:2539) at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2989) ... 22 more Apr 21, 2011 3:53:28 AM org.apache.solr.handler.dataimport.EntityProcessorBase getNext SEVERE: getNext() failed for query 'REDACTED' org.apache.solr.handler.dataimport.DataImportHandlerException: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure The last packet successfully received from the server was 128 milliseconds ago. The last packet sent successfully to the server was 25,273,484 milliseconds ago. ... {quote} It could probably be because of autocommit / segment merging. You could try to disable autocommit / increase mergeFactor {quote} I've used sphinx in the past, which uses multiple queries to pull out a subset of records ranged based on PrimaryKey, does Solr offer functionality similar to this? It seems that once a Solr index gets to a certain size, the indexing of a batch takes longer than MySQL's net_write_timeout, so it kills the connection. {quote} I was thinking about some hackish solution to paginate results entity name =pages query=SELECT id FROM generate_series( (SELECT count(*) from source_table) / 1000 ) ... entity name=records query=SELECT * from source_table LIMIT 1000 OFFSET ${pages.id}*1000 /entity /entity Or something along those lines ( you'd need to to calculate offset in pages query ) But unfortunately MySQL does not provide generate_series function (it's postgres function and there'r similar solutions for oracle and mssql). On Mon, Apr 25, 2011 at 3:59 AM, Scott Bigelow eph...@gmail.com wrote: Thank you everyone for your help. I ended up getting the index to work using the exact same config file on a (substantially) larger instance. On Fri, Apr 22, 2011 at 5:46 AM, Erick Erickson erickerick...@gmail.com wrote: {{{A custom indexer, so that's a fairly common practice? So when you are dealing with these large indexes, do you try not to fully rebuild them when you can? It's not a nightly thing, but something to do in case of a disaster? Is there a difference in the performance of an index that was built all at once vs. one that has had delta inserts and updates applied over a period of months?}}} Is it a common practice? Like all of this, it depends. It's certainly easier to let DIH do the work. Sometimes DIH doesn't have all the capabilities necessary. Or as Chris said, in the case where you already have a system built up and it's easier to just grab the output from that and send it to Solr, perhaps with SolrJ and not use DIH. Some people are just more comfortable with their own code... Do you try not to fully rebuild. It depends on how painful a full rebuild is. Some people just like the simplicity of starting over every day/week/month. But you *have* to be able to rebuild your index in case of disaster, and a periodic full rebuild certainly keeps that process up to date. Is there a difference...delta inserts...updates...applied over months. Not if you do an optimize. When a document is deleted (or updated), it's only marked as deleted. The associated data is still in the index. Optimize will reclaim that space and compact the segments, perhaps down to one. But there's no real operational difference between a newly-rebuilt index and one that's been optimized. If you don't delete/update, there's not much reason to optimize either I'll leave the DIH to others.. Best Erick On Thu, Apr 21, 2011 at 8:09 PM, Scott Bigelow eph...@gmail.com wrote: Thanks for the e-mail. I probably should have provided more details, but I was more interested in making sure I was approaching the problem correctly (using DIH, with
Re: Testing the limits of non-Java Solr
Yeah you don't need Java to use Solr. PHP, Curl, Python, HTTP Request APIs all work fine. The purpose of Solr is to wrap Lucene into a REST-like API that anyone can call using HTTP. On Thu, May 5, 2011 at 4:35 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Short answer: Yes, you can deploy a Solr cluster and write an application that talks to it without writing any Java (but it may be PHP or Python or unless that application is you typing telnet my-solr-server 8983 ) Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Jack Repenning jrepenn...@collab.net To: solr-user@lucene.apache.org Sent: Thu, May 5, 2011 6:28:31 PM Subject: Testing the limits of non-Java Solr What's the probability that I can build a non-trivial Solr app without writing any Java? I've been planning to use Solr, Lucene, and existing plug-ins, and sort of hoping not to write any Java (the app itself is Ruby / Rails). The dox (such as http://wiki.apache.org/solr/FAQ) seem encouraging. [I *can* write Java, but my planning's all been no Java.] I'm just beginning the design work in earnest, and I suddenly notice that it seems every mail thread, blog, or example starts out Java-free, but somehow ends up involving Java code. I'm not sure I yet understand all these snippets; conceivably some of the Java I see could just as easily be written in another language, but it makes me wonder. Is it realistic to plan a sizable Solr application without some Java programming? I know, I know, I know: everything depends on the details. I'd be interested even in anecdotes: has anyone ever achieved this before? Also, what are the clues I should look for that I need to step into the Java realm? I understand, for example, that it's possible to write filters and tokenizers to do stuff not available in any standard one; in this case, the clue would be I can't find what I want in the standard list, I guess. Are there other things I should look for? -==- Jack Repenning Technologist Codesion Business Unit CollabNet, Inc. 8000 Marina Boulevard, Suite 600 Brisbane, California 94005 office: +1 650.228.2562 twitter: http://twitter.com/jrep
Re: fast case-insensitive autocomplete
Are you giving that solution away? What is the costs? etc!! On Thu, May 5, 2011 at 2:58 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, I haven't used Suggester yet, but couldn't you feed it all lowercase content and then lowercase whatever the user is typing before sending it to Suggester to avoid case mismatch? Autocomplete on http://search-lucene.com/ uses http://sematext.com/products/autocomplete/index.html if you want a shortcut. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Kusenda, Brandyn J brandyn-kuse...@uiowa.edu To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Thu, May 5, 2011 9:22:03 AM Subject: fast case-insensitive autocomplete Hi. I need an autocomplete solution to handle case-insensitive queries but return the original text with the case still intact. I've experimented with both the Suggester and TermComponent methods. TermComponent is working when I use the regex option, however, it is far to slow. I get the speed i want by using term.prefix for by using the suggester but it's case sensitive. Here is an example operating on a user directory: Query: bran Results: Branden Smith, Brandon Thompson, Brandon Verner, Brandy Finny, Brian Smith, ... A solution that I would expect to work would be to store two fields; one containing the original text and the other containing the lowercase. Then convert the query to lower case and run the query against the lower case field and return the original (case preserved) field. Unfortunately, I can't get a TermComponent query to return additional fields. It only returns the field it's searching against. Should this work or can I only return additional fields for standard queries. Thanks in advance, Brandyn
Re: Does the Solr enable Lemmatization [not the Stemming]
Is there a parser that can take a string and tell you what part is an address, and what is not? Split the field into 2 fields? Search: Dr. Bell in Denver, CO Search: Dr. Smith near 10722 Main St, Denver, CO Search: Denver, CO for Cardiologist Thoughts? 2011/5/5 François Schiettecatte fschietteca...@gmail.com: Rajani You might also want to look at Balie ( http://balie.sourceforge.net/ ), from the web site: Features: • language identification • tokenization • sentence boundary detection • named-entity recognition Can't vouch for it though. On May 5, 2011, at 4:58 AM, Jan Høydahl wrote: Hi, Solr does not have lemmatization out of the box. You'll have to find 3rd party analyzers, and the most known such is from BasisTech. Please contact them to learn more. I'm not aware of any open source lemmatizers for Solr. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 5. mai 2011, at 10.34, rajini maski wrote: Does the solr enable lemmatization concept? I found a documentation that gives an information as solr enables lemmatization concept. Here is the link : http://www.basistech.com/knowledge-center/search/2010-09-language-identification-language-support-and-entity-extraction.pdf Can anyone help me finding the jar specified in that document so that i can add it as plugin. jar :rlp.solr.RLPTokenizerFactory Thanks and Regards, Rajani Maski
DIH disconnecting long-lived MySQL connections
I am using DIH with the MySQL connector to import data into my index. When doing a full import in my 3.1 test environment, it sometimes loses connection with the database and ends up rolling back the import. My import configuration uses a single query, so there's no possibility of a reconnect fixing this. Visit http://pastebin.com/Ya9DBMEP for the error log. I'm using mysql-connector-java-5.1.15-bin.jar. It seems that this occurs when Solr is busy doing multiple segment merges, when there are two merges partially complete and it's working on a third, causing ongoing index activity to cease for several minutes. Indexing activity seems to be fine up until there are three merges in progress. This is a virtual environment using Xen on CentOS5, two VMs. The host has SATA RAID1, so there's not a lot of I/O capacity. When both virtual machines are busy indexing, it can't keep up with the load, and one segment merge doesn't have time to complete before it's built up enough segments to start another one, which puts the first one on hold. If I build one virtual machine at a time, it doesn't do this, but then it takes twice as long. My 1.4.1 production systems builds all six shards at the same time when it's doing a full rebuild, but that's using RAID10. I grabbed a sniffer trace of the MySQL connection from the database server. After the last actual data packet in the capture, there is a 173 second pause followed by a Request Quit packet from the VM, then the connection is torn down normally. My best guess right now is that the idle-timeout-minutes setting in JDBC is coming into play here during my single query, and that it's set to 3 minutes. The Internet cannot seem to tell me what the default value is for this setting, and I do not see it mentioned anywhere in the MySQL/J source code. I tried adding idle-timeout-minutes=30 to the datasource definition in my DIH config, it didn't seem to do anything. Am I on the right track? Is there any way to configure DIH so that it won't do this? Thanks, Shawn
RE: Solr: org.apache.solr.common.SolrException: Invalid Date String:
Hi Craig, Thanks for the response, actually what we need to achive is see group by results based on dates like, 2011-01-01 23 2011-01-02 14 2011-01-03 40 2011-01-04 10 Now the records in my table run into millions, grouping the result based on UTC date would not produce the right result since the result should be grouped on users timezone. Is there anyway we can achieve this in Solr? Regards, Rohit -Original Message- From: Craig Stires [mailto:craig.sti...@gmail.com] Sent: 06 May 2011 04:30 To: solr-user@lucene.apache.org Subject: RE: Solr: org.apache.solr.common.SolrException: Invalid Date String: Rohit, The solr server using TrieDateField must receive values in the format 2011-01-07T17:00:30Z This should be a UTC-based datetime. The offset can be applied once you get your results back from solr SimpleDateFormat df = new SimpleDateFormat(format); df.setTimeZone(TimeZone.getTimeZone(IST)); java.util.Date dateunix = df.parse(datetime); -Craig -Original Message- From: Rohit [mailto:ro...@in-rev.com] Sent: Friday, 6 May 2011 2:31 AM To: solr-user@lucene.apache.org Subject: Solr: org.apache.solr.common.SolrException: Invalid Date String: Hi, I am new to solr and this is my first attempt at indexing solr data, I am getting the following exception while indexing, org.apache.solr.common.SolrException: Invalid Date String:'2011-01-07' at org.apache.solr.schema.DateField.parseMath(DateField.java:165) at org.apache.solr.schema.TrieDateField.createField(TrieDateField.java:169) at org.apache.solr.schema.SchemaField.createField(SchemaField.java:98) at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:204) at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:277) I understand from reading some articles that Solr stores time only in UTC, this is the query i am trying to index, Select id,text,'language',links,tweetType,source,location, bio,url,utcOffset,timeZone,frenCnt,createdAt,createdOnGMT,createdOnServerTim e,follCnt,favCnt,totStatusCnt,usrCrtDate,humanSentiment,replied,replyMsg,cla ssified,locationDetail, geonameid,country,continent,placeLongitude,placeLatitude,listedCnt,hashtag,m entions,senderInfScr, createdOnGMTDate,DATE_FORMAT(CONVERT_TZ(createdOnGMTDate,'+00:00','+05:30'), '%Y-%m-%d') as IST,DATE_FORMAT(CONVERT_TZ(createdOnGMTDate,'+00:00','+01:00'),'%Y-%m-%d') as ECT,DATE_FORMAT(CONVERT_TZ(createdOnGMTDate,'+00:00','+02:00'),'%Y-%m-%d') as EET,DATE_FORMAT(CONVERT_TZ(createdOnGMTDate,'+00:00','+03:30'),'%Y-%m-%d') as MET,sign(classified) as sentiment from Why i am doing this timezone conversion is because i need to group results by the user timezone. How can i achieve this? Regards, Rohit