DIH delta-import question
Dear list, I'm trying to delta-import with datasource FileDataSource and processor FileListEntityProcessor. I want to load only files which are newer than dataimport.properties - last_index_time. It looks like that newerThan=${dataimport.last_index_time} is without any function. Can it be that newerThan is configured under FileListEntityProcessor but used for the next following entity processor and not for FileListEntityProcessor itself? This is in my case the XPathEntityProcessor which doesn't support newerThan. Version is solr 4.0 from trunk. Regards, Bernd
SolrJ API for multi core?
Hi, Is $subject available?? Or do I need to make HTTP Get calls? -- Regards, Tharindu
Re: JVM GC troubles
Hi, I dont run totally OOM (no OOM exceptions in the log) but I constantly garbage collect. While not collecting, SOLR master handels the updates pretty well. Every insert is unique, so I dont have any deletes or optimizes and all queries are handled by the single slave instance. Is there a way to reduce the objects held in the old gen space ? It looks like the JVM is trying to hold as many objects as possible in the cache, to provide fast queries, who are not needed in my situation. Regarding the Jboss ... well as I said, its the minimalistic version of it and we use it due to automation process within our departement. In my test-env I tried it with a plain tomcat 6.x but without any improvements, so the Jboss overhead is minimal to nothing. The JVM parameters I wrote, are the ones I am struggling with at the moment. I was hoping someone will come up with a hint regarding the solarconfig.xml itself. PS: if anyone is questioning the implemented architecture (master - slave, configs, schema, etc.) ... its our architects fault and I have to operate it ;-) 2010/10/15 Otis Gospodnetic otis_gospodne...@yahoo.com Hello, I hope you are not running JBoss just to run Solr - there are simpler containers out there, e.g., Jetty. Do you OOM? Do things look better if you replicate less often (e.g. every 5 minutes instead of every 60 seconds)? Do all/some of those -X__ JVM params actually help? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: accid ac...@gmx.net To: solr-user@lucene.apache.org Sent: Thu, October 14, 2010 1:25:34 PM Subject: Re: JVM GC troubles I forgot a few important details: solr version = 1.4.1 current index size = 50gb growth ~600mb / day jboss runs with web settings (same as minimal) 2010/10/14 ac...@gmx.net Hi, as I am new here, I want to say hello and thanks in advance for your help. HW Setup: 1x SOLR Master - Sun Microsystems SUN FIRE X4450 - 4 x 2,93ghz, 64gb ram 1x SOLR Slave - Sun Microsystems SUN FIRE X4450 - 4 x 2,93ghz, 64gb ram SW Setup: Solaris 10 Generic_142901-03 jboss 5.1.0 JDK 1.6 update 18 # Specify the exact Java VM executable to use. # JAVA=/opt/appsrv/java6/bin/amd64/java # # Specify options to pass to the Java VM. # JAVA_OPTS=-server -Xms6144m -Xmx6144m -Xmn3072m -XX:ThreadStackSize=1024 -XX:MaxPermSize=512m -Dorg.jboss.resolver.warning=true -Dsun.rmi.dgc.client.gcInterval=360 -Dsun.rmi.dgc.server.gcInterval=360 -Dnetworkaddress.cache.ttl=1800 -XX:+UseConcMarkSweepGC SOLR Setup: #) the master has to deal an avg. update rate of 50 updates/s and peaks of 400 updates/s #) the slave replicates every 60s using the built in solr replication method (NOT rsync) #) the slave querys are ~20/sec #) schema.xml field name=myname1 type=string indexed=true stored=false required=true/ field name=myname2 type=int indexed=true stored=true required=true/ field name=myname3 type=int indexed=true stored=true required=true/ field name=myname4 type=long indexed=true stored=true required=true/ field name=myname5 type=int indexed=true stored=true required=true/ field name=myname6 type=string indexed=true stored=true required=true/ field name=myname7 type=string indexed=true stored=false/ field name=myname8 type=string indexed=true stored=false/ field name=myname9 type=string indexed=true stored=false/ field name=myname10 type=long indexed=true stored=false/ field name=myname11 type=int indexed=true stored=false/ field name=myname12 type=string indexed=true stored=false/ field name=myname13 type=tdate indexed=true stored=false/ field name=myname14 type=int indexed=true stored=false multiValued=true/ field name=myname15 type=string indexed=true stored=false multiValued=true/ field name=myname16 type=int indexed=true stored=false multiValued=true/ field name=myname17 type=string indexed=true stored=false multiValued=true/ field name=myname18 type=string indexed=true stored=false multiValued=true/ field name=myname19 type=string indexed=true stored=false multiValued=true/ field name=myname20 type=boolean indexed=true stored=false/ field name=myname21 type=int indexed=true stored=false required=true/ field name=myname22 type=date indexed=true stored=true default=NOW multiValued=false/ #) The solarconfig.xml is attached Both, master slave suffer from serious performance impacts during garbage collects I obviously have an GC problem, because ~30min after startup, the Old space is full and not beeing freed up. Below you find a JMX copypaste of the Heap AFTER a garbage collect!! As you can see, even
How do you programatically create new cores?
Hi everyone, I'm a newbie at this and I can't figure out how to do this after going through http://wiki.apache.org/solr/CoreAdmin? Any sample code would help a lot. Thanks in advance. -- Regards, Tharindu
Re: SOLRJ - Searching text in all fields of a Bean
Ahmet, I got it working to an extent. Now: SolrQuery query = new SolrQuery(); query.setQueryType(dismax); query.setQuery( kitten); query.setParam(qf, title); QueryResponse rsp = server.query( query ); ListSOLRTitle beans = rsp.getBeans(SOLRTitle.class); System.out.println(beans.size()); IteratorSOLRTitle it = beans.iterator(); while(it.hasNext()) { SOLRTitle solrTitle = (SOLRTitle)it.next(); System.out.println(solrTitle.id); System.out.println(solrTitle.title); } *This code is able to find the record, and prints the ID. But fails to print the Title.* Whereas: SolrQuery query = new SolrQuery(); query.setQuery( title:kitten ); QueryResponse rsp = server.query( query ); SolrDocumentList docs = rsp.getResults(); IteratorSolrDocument iter = rsp.getResults().iterator(); while (iter.hasNext()) { SolrDocument resultDoc = iter.next(); String title = (String) resultDoc.getFieldValue( title); String id = (String) resultDoc.getFieldValue(id); //id is the uniqueKey field System.out.println(id); System.out.println(title); } * This query succeeds!* What am I doing wrong in dismax params? The title field is being fetched as Null. Regards, Subhash Bhushan. On Fri, Oct 8, 2010 at 2:05 PM, Ahmet Arslan iori...@yahoo.com wrote: I have two fields in the bean class, id and title. After adding the bean to SOLR, I want to search for, say kitten, in all defined fields in the bean, like this -- query.setQuery( kitten); -- But I get results only when I affix the bean field name before the search text like this -- query.setQuery( title:kitten); -- Same case even when I use SolrInputDocument, and add these fields. Can we search text in all fields of a bean, without having to specify a field? With dismax, you can query several fields using different boosts. http://wiki.apache.org/solr/DisMaxQParserPlugin
problem on running fullimport
Hi, I am using the full import option with the data-config file as mentioned below dataConfig dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql:///xxx user=xxx password=xx / document entity name=yyy query=select studentName from test1 field column=studentName name=studentName / /entity /document /dataConfig on running the full-import option I am getting the error mentioned below.I had already included the dataimport.properties file in my conf file.help me to get the issue resolved response - lst name=responseHeader int name=status0/int int name=QTime334/int /lst - lst name=initArgs - lst name=defaults str name=configdata-config.xml/str /lst /lst str name=commandfull-import/str str name=modedebug/str null name=documents/ - lst name=verbose-output - lst name=entity:test1 - lst name=document#1 str name=queryselect studentName from test1/str - str name=EXCEPTION org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: select studentName from test1 Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:253) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39) at org.apache.solr.handler.dataimport.DebugLogger$2.getData(DebugLogger.java:184) at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:71) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:357) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389) at org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:203) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) Caused by: java.sql.SQLException: Illegal value for setFetchSize(). at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1075) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:989) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:984) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:929) at com.mysql.jdbc.StatementImpl.setFetchSize(StatementImpl.java:2496) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:242) ... 33 more /str str name=time-taken0:0:0.50/str /lst /lst /lst str name=statusidle/str str name=importResponseConfiguration Re-loaded sucessfully/str - lst name=statusMessages str name=Time Elapsed0:0:0.299/str str name=Total Requests made to DataSource1/str str name=Total Rows Fetched0/str str name=Total Documents Processed0/str str
Re: problem on running fullimport
On Fri, Oct 15, 2010 at 7:42 AM, swapnil dubey swapnil.du...@gmail.comwrote: Hi, I am using the full import option with the data-config file as mentioned below dataConfig dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql:///xxx user=xxx password=xx / document entity name=yyy query=select studentName from test1 field column=studentName name=studentName / /entity /document /dataConfig on running the full-import option I am getting the error mentioned below.I had already included the dataimport.properties file in my conf file.help me to get the issue resolved response - lst name=responseHeader int name=status0/int int name=QTime334/int /lst - lst name=initArgs - lst name=defaults str name=configdata-config.xml/str /lst /lst str name=commandfull-import/str str name=modedebug/str null name=documents/ - lst name=verbose-output - lst name=entity:test1 - lst name=document#1 str name=queryselect studentName from test1/str - str name=EXCEPTION org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: select studentName from test1 Processing Document # 1 ... -- Regards Swapnil Dubey Swapnil, Everything looks fine, except that in your entity definition you forgot to define which datasource you wish to use. So if you add the 'dataSource=JdbcDataSource' that should get rid of your exception. As a reminder, the DataImportHandler wiki ( http://wiki.apache.org/solr/DataImportHandler) on Apache's website is very helpful with learning how to use the DIH properly. It has helped me with having a printed copy beside me for easy and quick reference. - Ken
Re: SOLRJ - Searching text in all fields of a Bean
Hi Savvas, Thanks!! Was able to search using copyField/ directive. I was using the default example schema packaged with solr. I added the following directive for title field and reindexed data: *copyField source=title dest=text/* Regards, Subhash Bhushan. On Fri, Oct 8, 2010 at 2:09 PM, Savvas-Andreas Moysidis savvas.andreas.moysi...@googlemail.com wrote: Hello, What does your schema look like? Have you defined a catch all field and copy every value from all your other fields in it with a copyField / directive? Cheers, -- Savvas On 8 October 2010 08:30, Subhash Bhushan subhash.bhus...@stratalabs.inwrote: Hi, I have two fields in the bean class, id and title. After adding the bean to SOLR, I want to search for, say kitten, in all defined fields in the bean, like this -- query.setQuery( kitten); -- But I get results only when I affix the bean field name before the search text like this -- query.setQuery( title:kitten); -- Same case even when I use SolrInputDocument, and add these fields. Can we search text in all fields of a bean, without having to specify a field? If we can, what am I missing in my code? *Code:* Bean: --- public class SOLRTitle { @Field public String id = ; @Field public String title = ; } --- Indexing function: --- private static void uploadData() { try { ... // Get Titles ListSOLRTitle solrTitles = new ArrayListSOLRTitle(); IteratorTitle it = titles.iterator(); while(it.hasNext()) { Title title = (Title) it.next(); SOLRTitle solrTitle = new SOLRTitle(); solrTitle.id = title.getID().toString(); solrTitle.title = title.getTitle(); solrTitles.add(solrTitle); } server.addBeans(solrTitles); server.commit(); } catch (SolrServerException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } } --- Querying function: --- private static void queryData() { try { SolrQuery query = new SolrQuery(); query.setQuery( kitten); QueryResponse rsp = server.query( query ); ListSOLRTitle beans = rsp.getBeans(SOLRTitle.class); System.out.println(beans.size()); IteratorSOLRTitle it = beans.iterator(); while(it.hasNext()) { SOLRTitle solrTitle = (SOLRTitle)it.next(); System.out.println(solrTitle.id); System.out.println(solrTitle.title); } } catch (SolrServerException e) { e.printStackTrace(); } } -- Subhash Bhushan.
Re: Quick question on indexing an existing index
Why don't you simply index the source content which you used to build index2 into index1, i.e. have your tool index to both? You won't save anything on trying to extract that content from an existing index. But of course, you COULD write yourself a tool which extracts all stored fields for all documents in index2, transform it into docs which fit in index1 and then insert them. But how will you support deletes etc? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 14. okt. 2010, at 17.06, bbarani wrote: Hi, I have a very simple question about indexing an existing index. We have 2 index, index 1 is being maintained by us (it indexes the data from a database) and we have an index 2 which is maintaing by a tool.. Both the schemas are totally different but we are interested to re-index the index present in index2 in to index1 such that we will be having just one single index (index 1 ) which will contain the data present in both index. We want to re-index the index present in index 2 using the schema presnt for index 1. Also we are interested in customizing the data (something like selecting columns / fields from DB using DB import handler). Thanks, BB -- View this message in context: http://lucene.472066.n3.nabble.com/Quick-question-on-indexing-an-existing-index-tp1701663p1701663.html Sent from the Solr - User mailing list archive at Nabble.com.
Exception being thrown indexing a specific pdf document using Solr Cell
I've got an existing Spring Solr SolrJ application that indexes a mixture of documents. It seems to have been working fine now for a couple of weeks but today I've just started getting an exception when processing a certain pdf file. The exception is : ERROR: org.apache.solr.core.SolrCore - org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pdf.pdfpar...@4683c2 at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:211) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:139) at uk.co.sjp.intranet.service.SolrServiceImpl.loadDocuments(SolrServiceImpl.java:308) at uk.co.sjp.intranet.SearchController.loadDocuments(SearchController.java:297) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.springframework.web.bind.annotation.support.HandlerMethodInvoker.doInvokeMethod(HandlerMethodInvoker.java:710) at org.springframework.web.bind.annotation.support.HandlerMethodInvoker.invokeHandlerMethod(HandlerMethodInvoker.java:167) at org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.invokeHandlerMethod(AnnotationMethodHandlerAdapter.java:414) at org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.handle(AnnotationMethodHandlerAdapter.java:402) at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:771) at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:716) at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:647) at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:552) at javax.servlet.http.HttpServlet.service(HttpServlet.java:617) at javax.servlet.http.HttpServlet.service(HttpServlet.java:717) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.ApplicationDispatcher.invoke(ApplicationDispatcher.java:630) at org.apache.catalina.core.ApplicationDispatcher.processRequest(ApplicationDispatcher.java:436) at org.apache.catalina.core.ApplicationDispatcher.doForward(ApplicationDispatcher.java:374) at org.apache.catalina.core.ApplicationDispatcher.forward(ApplicationDispatcher.java:302) at org.tuckey.web.filters.urlrewrite.NormalRewrittenUrl.doRewrite(NormalRewrittenUrl.java:195) at org.tuckey.web.filters.urlrewrite.RuleChain.handleRewrite(RuleChain.java:159) at org.tuckey.web.filters.urlrewrite.RuleChain.doRules(RuleChain.java:141) at org.tuckey.web.filters.urlrewrite.UrlRewriter.processRequest(UrlRewriter.java:90) at org.tuckey.web.filters.urlrewrite.UrlRewriteFilter.doFilter(UrlRewriteFilter.java:417) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pdf.pdfpar...@4683c2 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:105) at
Re: Term is duplicated when updating a document
Which fields are modified when the document is updated/replaced. Are there any differences in the content of the fields that you are using for the AutoSuggest. Have you changed you schema.xml file recently? If you have, then there may have been changes in the way these fields are analyzed and broken down to terms. This may be a bug if you did not change the field or the schema file but the terms count is changing. On Fri, Oct 15, 2010 at 9:14 AM, Thomas Kellerer spam_ea...@gmx.net wrote: Hi, we are updating our documents (that represent products in our shop) when a dealer modifies them, by calling SolrServer.add(SolrInputDocument) with the updated document. My understanding is, that there is no other way of updating an existing document. However we also use a term query to autocomplete the search field for the user, but each time adocument is updated (added) the term count is incremented. So after starting with a new index the count is e.g. 1, then the document (that contains that term) is updated, and the count is 2, the next update will set this to 3 and so on. One the index is optimized (by calling SolServer.optimize()) the count is correct again. Am I missing something or is this a bug in Solr/Lucene? Thanks in advance Thomas -- °O° Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Possible to sort by explicit docid order?
Hi, In an online bookstore project I'm working on, most frontend widgets are search driven. Most often they query with some filters and a sort order, such as availabledate desc or simply by score. However, to allow editorial control, some widgets will display a fixed list of books, defined as an ordered list of ISBN numbers inserted by the editor. Based on this we do a Solr search to fetch the data to display: fq=isbn:(9788200011699 OR 9788200012658 OR ...) It is important to return the results in the same order as the explicitly given list of ISBNs. But I cannot see a way to do that, not even with sort by function. So currently we re-order the result list in the frontend. Would it make sense with an explicit sort order, perhaps implemented as a function? sort=fieldvaluelist(isbn,1000,1,0,$isbnorder) desc, price ascisbnorder=9788200011699,9788200012658,9788200013839,9788200014140 The function would be defined as fieldvaluelist(field,startvalue,gap,fallback,field-value[,field-value...]) The output of the example above would be: For document with ISBN=9788200011699: 1000 For document with ISBN=9788200012658: 999 For document with ISBN=9788200013839: 998 For document with ISBN not in the list: 0 (fallback - in which case the second sort order would kick in) -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com
Re: Term is duplicated when updating a document
Thanks for the answer. Which fields are modified when the document is updated/replaced. Only one field was changed, but it was not the one where the auto-suggest term is coming from. Are there any differences in the content of the fields that you are using for the AutoSuggest. No Have you changed you schema.xml file recently? If you have, then there may have been changes in the way these fields are analyzed and broken down to terms. No, I did a complete index rebuild to rule out things like that. Then after startup, did a search, then updated the document and did a search again. Regards Thomas This may be a bug if you did not change the field or the schema file but the terms count is changing. On Fri, Oct 15, 2010 at 9:14 AM, Thomas Kellererspam_ea...@gmx.net wrote: Hi, we are updating our documents (that represent products in our shop) when a dealer modifies them, by calling SolrServer.add(SolrInputDocument) with the updated document. My understanding is, that there is no other way of updating an existing document. However we also use a term query to autocomplete the search field for the user, but each time adocument is updated (added) the term count is incremented. So after starting with a new index the count is e.g. 1, then the document (that contains that term) is updated, and the count is 2, the next update will set this to 3 and so on. One the index is optimized (by calling SolServer.optimize()) the count is correct again. Am I missing something or is this a bug in Solr/Lucene? Thanks in advance Thomas
Re: searching while importing
On Thu, Oct 14, 2010 at 4:08 AM, Shawn Heisey s...@elyograg.org wrote: If you are using the DataImportHandler, you will not be able to search new data until the full-import or delta-import is complete and the update is committed. When I do a full reindex, it takes about 5 hours, and until it is finished, I cannot search it. I have not tried to issue a manual commit in the middle of an import to see whether that makes data inserted up to that point searchable, but I would not expect that to work. [...] Just as a data point, we have done this, and yes it is possible to do a commit in the middle of an import, and have the documents that have already been indexed be available for search. Regards, Gora
filter query from external list of Solr unique IDs
At the Lucene Revolution conference I asked about efficiently building a filter query from an external list of Solr unique ids. Some use cases I can think of are: 1) personal sub-collections (in our case a user can create a small subset of our 6.5 million doc collection and then run filter queries against it) 2) tagging documents 3) access control lists 4) anything that needs complex relational joins 5) a sort of alternative to incremental field updating (i.e. update in an external database or kv store) 6) Grant's clustering cluster points and similar apps. Grant pointed to SOLR 1715, but when I looked on JIRA, there doesn't seem to be any work on it yet. Hoss mentioned a couple of ideas: 1) sub-classing query parser 2) Having the app query a database and somehow passing something to Solr or lucene for the filter query Can Hoss or someone else point me to more detailed information on what might be involved in the two ideas listed above? Is somehow keeping an up-to-date map of unique Solr ids to internal Lucene ids needed to implement this or is that a separate issue? Tom Burton-West http://www.hathitrust.org/blogs/large-scale-search
RE: filter query from external list of Solr unique IDs
Definitely interested in this. The naive obvious approach would be just putting all the ID's in the query. Like fq=(id:1 OR id:2 OR). Or making it another clause in the 'q'. Can you outline what's wrong with this approach, to make it more clear what's needed in a solution? From: Burton-West, Tom [tburt...@umich.edu] Sent: Friday, October 15, 2010 11:49 AM To: solr-user@lucene.apache.org Subject: filter query from external list of Solr unique IDs At the Lucene Revolution conference I asked about efficiently building a filter query from an external list of Solr unique ids. Some use cases I can think of are: 1) personal sub-collections (in our case a user can create a small subset of our 6.5 million doc collection and then run filter queries against it) 2) tagging documents 3) access control lists 4) anything that needs complex relational joins 5) a sort of alternative to incremental field updating (i.e. update in an external database or kv store) 6) Grant's clustering cluster points and similar apps. Grant pointed to SOLR 1715, but when I looked on JIRA, there doesn't seem to be any work on it yet. Hoss mentioned a couple of ideas: 1) sub-classing query parser 2) Having the app query a database and somehow passing something to Solr or lucene for the filter query Can Hoss or someone else point me to more detailed information on what might be involved in the two ideas listed above? Is somehow keeping an up-to-date map of unique Solr ids to internal Lucene ids needed to implement this or is that a separate issue? Tom Burton-West http://www.hathitrust.org/blogs/large-scale-search
Re: filter query from external list of Solr unique IDs
On Fri, Oct 15, 2010 at 11:49 AM, Burton-West, Tom tburt...@umich.edu wrote: At the Lucene Revolution conference I asked about efficiently building a filter query from an external list of Solr unique ids. Yeah, I've thought about a special query parser and query to deal with this (relatively) efficiently, both from a query perspective and a memory perspective. Should be pretty quick to throw together: - comma separated list of terms (unique ids are a special case of this) - in the query, store as a single byte array for efficiency - sort the ids if they aren't already sorted - do lookups with a term enumerator and skip weighting or anything else like that - configurable caching... may, or may not want to cache this big query That's only part of the stuff you mention, but seems like it would be useful to a number of people. -Yonik http://www.lucidimagination.com
Re: Sorting on arbitary 'custom' fields
On Mon, Oct 11, 2010 at 07:17:43PM +0100, me said: It was just an idea though and I was hoping that there would be a simpler more orthodox way of doing it. In the end, for anyone who cares, we used dynamic fields. There are a lot of them but we haven't seen performance impacted that badly so far.
Re: weighted facets
Hi, answering my own question(s). Result grouping could be the solution as I explained here: https://issues.apache.org/jira/browse/SOLR-385 http://www.cs.cmu.edu/~ddash/papers/facets-cikm.pdf (the file is dated to Aug 2008) yonik implemented this here: https://issues.apache.org/jira/browse/SOLR-153 So, really cool: he's the inventor/first-thinker of their 'bitset tree' ! :-) http://search.lucidimagination.com/search/document/6ccbec5e602687ae/facet_optimizing#6ccbec5e602687ae Regards, Peter. Hi, I need a feature which is well explained from Mr Goll at this site ** So, it then would be nice to do sth. like: facet.stats=sum(fieldX)facet.stats.sort=fieldX And the output (sorted against the sum-output) can look sth. like this: lst name=facet_counts lst name=facet_fields lst name=tag int name=jobs fieldX=14700767/int int name=video fieldX=13700892/int Is there something similar or was this answered from Hoss at the lucene revolution? If not I'll open a JIRA issue ... BTW: is the work from http://www.cs.cmu.edu/~ddash/papers/facets-cikm.pdf contributed back to solr? Regards, Peter. PS: Related issue: https://issues.apache.org/jira/browse/SOLR-680 https://issues.apache.org/jira/secure/attachment/12400054/SOLR-680.patch ** http://lucene.crowdvine.com/posts/14137409 Quoting his question in case the site goes offline: Hi Chris, Usually a facet search returns the document count for the unique values in the facet field. Is there a way to return a weighted facet count based on a user-defined function (sum, product, etc.) of another field? Here is a sum example. Assume we have the following 4 documents with 3 fields ID facet_field weight_field 1 solr 0.4 2 lucene 0.3 3 lucene 0.1 4 lucene 0.2 Is there a way to return solr 0.4 lucene 0.6 instead of solr 1 lucene 3 Given the facet_field contains multiple values ID facet_field weight_field 1 solr lucene 0.2 2 lucene 0.3 3 solr lucene 0.1 4 lucene 0.2 Is there a way to return solr 0.3 lucene 0.8 instead of solr 2 lucene 4 Thanks, Johannes -- http://jetwick.com twitter search prototype
Re: Term is duplicated when updating a document
This is actually known behavior. The problem is that when you update a document, it's deleted and re-added, but the original is marked as deleted. However, the terms aren't touched, both the original and the new document's terms are counted. It'd be hard, very hard, to remove the terms from the inverted index efficiently. But when you optimize, all the deleted documents (and their assiociated terms) are physically removed from the files, thus your term counts change. HTH Erick On Fri, Oct 15, 2010 at 10:05 AM, Thomas Kellerer spam_ea...@gmx.netwrote: Thanks for the answer. Which fields are modified when the document is updated/replaced. Only one field was changed, but it was not the one where the auto-suggest term is coming from. Are there any differences in the content of the fields that you are using for the AutoSuggest. No Have you changed you schema.xml file recently? If you have, then there may have been changes in the way these fields are analyzed and broken down to terms. No, I did a complete index rebuild to rule out things like that. Then after startup, did a search, then updated the document and did a search again. Regards Thomas This may be a bug if you did not change the field or the schema file but the terms count is changing. On Fri, Oct 15, 2010 at 9:14 AM, Thomas Kellererspam_ea...@gmx.net wrote: Hi, we are updating our documents (that represent products in our shop) when a dealer modifies them, by calling SolrServer.add(SolrInputDocument) with the updated document. My understanding is, that there is no other way of updating an existing document. However we also use a term query to autocomplete the search field for the user, but each time adocument is updated (added) the term count is incremented. So after starting with a new index the count is e.g. 1, then the document (that contains that term) is updated, and the count is 2, the next update will set this to 3 and so on. One the index is optimized (by calling SolServer.optimize()) the count is correct again. Am I missing something or is this a bug in Solr/Lucene? Thanks in advance Thomas
RE: filter query from external list of Solr unique IDs
The main problem I've encountered with the lots of OR clauses approach is that you eventually hit the limit on Boolean clauses and the whole query fails. You can keep raising the limit through the Solr configuration, but there's still a ceiling eventually. - Demian -Original Message- From: Jonathan Rochkind [mailto:rochk...@jhu.edu] Sent: Friday, October 15, 2010 1:07 PM To: solr-user@lucene.apache.org Subject: RE: filter query from external list of Solr unique IDs Definitely interested in this. The naive obvious approach would be just putting all the ID's in the query. Like fq=(id:1 OR id:2 OR). Or making it another clause in the 'q'. Can you outline what's wrong with this approach, to make it more clear what's needed in a solution? From: Burton-West, Tom [tburt...@umich.edu] Sent: Friday, October 15, 2010 11:49 AM To: solr-user@lucene.apache.org Subject: filter query from external list of Solr unique IDs At the Lucene Revolution conference I asked about efficiently building a filter query from an external list of Solr unique ids. Some use cases I can think of are: 1) personal sub-collections (in our case a user can create a small subset of our 6.5 million doc collection and then run filter queries against it) 2) tagging documents 3) access control lists 4) anything that needs complex relational joins 5) a sort of alternative to incremental field updating (i.e. update in an external database or kv store) 6) Grant's clustering cluster points and similar apps. Grant pointed to SOLR 1715, but when I looked on JIRA, there doesn't seem to be any work on it yet. Hoss mentioned a couple of ideas: 1) sub-classing query parser 2) Having the app query a database and somehow passing something to Solr or lucene for the filter query Can Hoss or someone else point me to more detailed information on what might be involved in the two ideas listed above? Is somehow keeping an up-to-date map of unique Solr ids to internal Lucene ids needed to implement this or is that a separate issue? Tom Burton-West http://www.hathitrust.org/blogs/large-scale-search
RE: filter query from external list of Solr unique IDs
Hi Jonathan, The advantages of the obvious approach you outline are that it is simple, it fits in to the existing Solr model, it doesn't require any customization or modification to Solr/Lucene java code. Unfortunately, it does not scale well. We originally tried just what you suggest for our implementation of Collection Builder. For a user's personal collection we had a table that maps the collection id to the unique Solr ids. Then when they wanted to search their collection, we just took their search and added a filter query with the fq=(id:1 OR id:2 OR). I seem to remember running in to a limit on the number of OR clauses allowed. Even if you can set that limit larger, there are a number of efficiency issues. We ended up constructing a separate Solr index where we have a multi-valued collection number field. Unfortunately, until incremental field updating gets implemented, this means that every time someone adds a document to a collection, the entire document (including 700KB of OCR) needs to be re-indexed just to update the collection number field. This approach has allowed us to scale up to a total of something under 100,000 documents, but we don't think we can scale it much beyond that for various reasons. I was actually thinking of some kind of custom Lucene/Solr component that would for example take a query parameter such as lookitUp=123 and the component might do a JDBC query against a database or kv store and return results in some form that would be efficient for Solr/Lucene to process. (Of course this assumes that a JDBC query would be more efficient than just sending a long list of ids to Solr). The other part of the equation is mapping the unique Solr ids to internal Lucene ids in order to implement a filter query. I was wondering if something like the unique id to Lucene id mapper in zoie might be useful or if that is too specific to zoie. SoThis may be totally off-base, since I haven't looked at the zoie code at all yet. In our particular use case, we might be able to build some kind of in-memory map after we optimize an index and before we mount it in production. In our workflow, we update the index and optimize it before we release it and once it is released to production there is no indexing/merging taking place on the production index (so the internal Lucene ids don't change.) Tom -Original Message- From: Jonathan Rochkind [mailto:rochk...@jhu.edu] Sent: Friday, October 15, 2010 1:07 PM To: solr-user@lucene.apache.org Subject: RE: filter query from external list of Solr unique IDs Definitely interested in this. The naive obvious approach would be just putting all the ID's in the query. Like fq=(id:1 OR id:2 OR). Or making it another clause in the 'q'. Can you outline what's wrong with this approach, to make it more clear what's needed in a solution?
facet.field :java.lang.NullPointerException
Faceting blows up when the field has no data. And this seems to be random. Sometimes it will work even with no data, other times not. Sometimes the error goes away if the field is set to multiValued=true (even though it's one value every time), other times it doesn't. In all cases setting facet.method to enum takes care of the problem. If this param is not set, the default leads to null pointer exception. 09:18:52,218 SEVERE [SolrCore] Exception during facet.field of xyz:java.lang.NullPointerException at java.lang.System.arraycopy(Native Method) at org.apache.lucene.util.PagedBytes.copy(PagedBytes.java:247) at org.apache.solr.request.TermIndex$1.setTerm(UnInvertedField.java:1164) at org.apache.solr.request.NumberedTermsEnum.init(UnInvertedField.java:960) at org.apache.solr.request.TermIndex$1.init(UnInvertedField.java:1151) at org.apache.solr.request.TermIndex.getEnumerator(UnInvertedField.java:1151) at org.apache.solr.request.UnInvertedField.uninvert(UnInvertedField.java:204) at org.apache.solr.request.UnInvertedField.init(UnInvertedField.java:188) at org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:911) at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:298) at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:354) at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:190) at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:72) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:210) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240) at
Re: facet.field :java.lang.NullPointerException
This is https://issues.apache.org/jira/browse/SOLR-2142 I'll look into it soon. -Yonik http://www.lucidimagination.com On Fri, Oct 15, 2010 at 3:12 PM, Pradeep Singh pksing...@gmail.com wrote: Faceting blows up when the field has no data. And this seems to be random. Sometimes it will work even with no data, other times not. Sometimes the error goes away if the field is set to multiValued=true (even though it's one value every time), other times it doesn't. In all cases setting facet.method to enum takes care of the problem. If this param is not set, the default leads to null pointer exception. 09:18:52,218 SEVERE [SolrCore] Exception during facet.field of xyz:java.lang.NullPointerException at java.lang.System.arraycopy(Native Method) at org.apache.lucene.util.PagedBytes.copy(PagedBytes.java:247) at org.apache.solr.request.TermIndex$1.setTerm(UnInvertedField.java:1164) at org.apache.solr.request.NumberedTermsEnum.init(UnInvertedField.java:960) at org.apache.solr.request.TermIndex$1.init(UnInvertedField.java:1151) at org.apache.solr.request.TermIndex.getEnumerator(UnInvertedField.java:1151) at org.apache.solr.request.UnInvertedField.uninvert(UnInvertedField.java:204) at org.apache.solr.request.UnInvertedField.init(UnInvertedField.java:188) at org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:911) at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:298) at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:354) at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:190) at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:72) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:210) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240) at
Re: Synchronizing Solr with a PostgreDB
Thanks for the quick response! =o) We will go with that approach. On Thu, Oct 14, 2010 at 7:19 PM, Allistair Crossley a...@roxxor.co.uk wrote: i would not cross-reference solr results with your database to merge unless you want to spank your database. nor would i load solr with all your data. what i have found is that the search results page is generally a small subset of data relating to the fuller document/result. therefore i store only the data required to present the search results wholly from solr. the user can choose to click into a specific result which then uses just the database to present it. use data import handler - define an xml config to import as many entities into your document as you need and map columns to fields in schema.xml. use the Wiki page on DIH - it's all there, as well as example config in the solr distro. allistair On Oct 14, 2010, at 6:13 PM, Juan Manuel Alvarez wrote: Hello everyone! I am new to Solr and Lucene and I would like to ask you a couple of questions. I am working on an existing system that has the data saved in a Postgre DB and now I am trying to integrate Solr to use full-text search and faceted search, but I am having a couple of doubts about it. 1) I see two ways of storing the data and make the search: - Duplicate all the DB data in Solr, so complete results are returned from a search query, or... - Put in Solr just the data that I need to search and, after finding the elements with a Solr query, use the result to make a more specific query to the DB. Which is the way this is normally done? 2) How do I synchronize Solr and Postgre? Do I have to use the DataImportHandler or when I do the INSERT command into Postgre, I have to execute a command into Solr? Thanks for your time! Cheers! Juan M.
Re: SOLRJ - Searching text in all fields of a Bean
You can replace query.setQueryType(dismax) with query.set(defType, dismax); Also don't forget to request title field with fl parameter. query.addField(title);
Re: Solr with example Jetty and score problem
: Thanks. But do you have any suggest or work-around to deal with it? Posted in SOLR-2140 field name=score type=ignored multiValued=false / ..this key is to make sure solr knows score is not multiValued -Hoss
Re: ant build problem
: i updated my solr trunk to revision 1004527. when i go for compiling : the trunk with ant i get so many warnings, but the build is successful. the Most of these warnings are legitimate, the probelms have always been there, but recently the Lucene build file was updated to warn about them by default. This one though... : [javac] warning: [path] bad path element : /usr/share/ant/lib/hamcrest-core.jar: no such file or directory ...thta's something specific to your setup. something in your systems ant configs thinks thta jar should be there. : After the compiling i thought to check with the ant test and performed but : it is failed.. failing tests are also a posisbility ... there are several tests in hte code base right now that fail sporadicly (especially because of recent changes ot hte build system designed to get test that *might* fail based on locale to fail more often) and people are working on them -- w/o full details about wat failurs you got though, we can't say if they are known issues. -Hoss
Re: having problem about Solr Date Field.
: So, regarding DST, do you put everything in GMT, and make adjustments : for in the 'seach for/between' data/time values before the query for : both DST and TZ? The client adding docs is hte only one that knows what TZ it's in when it formats the docs to add them, and the client issuing the query is hte only one that knows what TZ it's in when it formats the query string to execute the query. in both cases the client must use the UTC TZ when formating the date strings so that Solr can deal with it correctly. -Hoss
Re: Question related to phrase search in lucene/solr?
: I have question is it possible to perform a phrase search with wild cards in : solr/lucene as if i have two queries both have exactly same results one is : +Contents:change market : : and other is : +Contents:chnage* market : : but i think the second should match chages market as well but it does not : matches it. Any help would be appreciated In my experience, 90% of the times people ask about using wildcards in a phrase query what they really want is simple stemming of the terms -- the one example you've cited is an example of this. If your Contents field uses an analyzer that does stemming then change market and changes market would both match. -Hoss
Re: Disable (or prohibit) per-field overrides
: Anyone knows useful method to disable or prohibit the per-field override : features for the search components? If not, where to start to make it : configurable via solrconfig and attempt to come up with a working patch? If your goal is to prevent *clients* from specifying these (while you're still allowed to use them in your defaults) then the simplest solution is probably something external to Solr -- along the lines of mod_rewrite. Internally... that would be tough. You could probably write a SearchComponent (configured to run first) that does it fairly easily -- just wrap the SolrParams in an impl that retuns null anytime a component asks for a param name that starts with f. (and excludes those param names when asked for a list of the param names) It could probably be generalized to support arbitrary rules i na way that might be handy for other folks, but it would still just be wrapping all of hte params, so it would prevent you from using them in your config as well. Ultimatley i think a general solution would need to be in RequestHandlerBase ... where it wraps the request params using the defaults and invariants ... you'd want the custom exclusion rules to apply only to the request params from the client. -Hoss
RE: filter query from external list of Solr unique IDs
Thanks Yonik, Is this something you might have time to throw together, or an outline of what needs to be thrown together? Is this something that should be asked on the developer's list or discussed in SOLR 1715 or does it make the most sense to keep the discussion in this thread? Tom -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Friday, October 15, 2010 1:19 PM To: solr-user@lucene.apache.org Subject: Re: filter query from external list of Solr unique IDs On Fri, Oct 15, 2010 at 11:49 AM, Burton-West, Tom tburt...@umich.edu wrote: At the Lucene Revolution conference I asked about efficiently building a filter query from an external list of Solr unique ids. Yeah, I've thought about a special query parser and query to deal with this (relatively) efficiently, both from a query perspective and a memory perspective. Should be pretty quick to throw together: - comma separated list of terms (unique ids are a special case of this) - in the query, store as a single byte array for efficiency - sort the ids if they aren't already sorted - do lookups with a term enumerator and skip weighting or anything else like that - configurable caching... may, or may not want to cache this big query That's only part of the stuff you mention, but seems like it would be useful to a number of people. -Yonik http://www.lucidimagination.com
SOLR DateTime and SortableLongField field type problems
Hello all, I am using SOLR-1.4.1 with the DataImportHandler, and I am trying to follow the advice from http://www.mail-archive.com/solr-user@lucene.apache.org/msg11887.html about converting date fields to SortableLong fields for better memory efficiency. However, whenever I try to do this using the DateFormater, I get exceptions when indexing for every row that tries to create my sortable fields. In my schema.xml, I have the following definitions for the fieldType and dynamicField: fieldType name=sdate class=solr.SortableLongField indexed=true stored=false sortMissingLast=true omitNorms=true / dynamicField name=sort_date_* type=sdate stored=false indexed=true / In my dih.xml, I have the following definitions: dataConfig dataSource type=FileDataSource encoding=UTF-8 / entity name=xml_stories rootEntity=false dataSource=null processor=FileListEntityProcessor fileName=legacy_stories.*\.xml$ recursive=false baseDir=/usr/local/extracts newerThan=${dataimporter.xml_stories.last_index_time} entity name=stories pk=id dataSource=xml_stories processor=XPathEntityProcessor url=${xml_stories.fileAbsolutePath} forEach=/RECORDS/RECORD stream=true transformer=DateFormatTransformer,HTMLStripTransformer,RegexTransformer,TemplateTransformer onError=continue field column=_modified_date xpath=/RECORDS/RECORD/pr...@name='R_ModifiedTime']/PVAL / field column=modified_date sourceColName=_modified_date dateTimeFormat=-MM-dd'T'hh:mm:ss'Z' / field column=_df_date_published xpath=/RECORDS/RECORD/pr...@name='R_StoryDate']/PVAL / field column=df_date_published sourceColName=_df_date_published dateTimeFormat=-MM-dd'T'hh:mm:ss'Z' / field column=sort_date_modified sourceColName=modified_date dateTimeFormat=MMddhhmmss / field column=sort_date_published sourceColName=df_date_published dateTimeFormat=MMddhhmmss / /entity /entity /document /dataConfig The fields in question are in the formats: RECORDS RECORD PROP NAME=R_StoryDate PVAL2001-12-04T00:00:00Z/PVAL /PROP PROP NAME=R_ModifiedTime PVAL2001-12-04T19:38:01Z/PVAL /PROP /RECORD /RECORDS The exception that I am receiving is: Oct 15, 2010 6:23:24 PM org.apache.solr.handler.dataimport.DateFormatTransformer transformRow WARNING: Could not parse a Date field java.text.ParseException: Unparseable date: Wed Nov 28 21:39:05 EST 2007 at java.text.DateFormat.parse(DateFormat.java:337) at org.apache.solr.handler.dataimport.DateFormatTransformer.process(DateFormatTransformer.java:89) at org.apache.solr.handler.dataimport.DateFormatTransformer.transformRow(DateFormatTransformer.java:69) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.applyTransformer(EntityProcessorWrapper.java:195) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:241) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:357) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:383) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370) I know that it has to be the SortableLong fields, because if I remove just those two lines from my dih.xml, everything imports as I expect it to. Am I doing something wrong? Mis-using the SortableLong and/or DateTransformer? Is this not supported in my version of SOLR? I'm not very experienced with Java, so digging into the code would be a lost cause for me right now. I was hoping that somebody here might be able to help point me in the right/correct direction. It should be noted that the modified_date and df_date_published fields index just fine (so long as I do it as I've defined above). Thank you, - Ken It looked like something resembling white marble, which was probably what it was: something resembling white marble. -- Douglas Adams, The Hitchhikers Guide to the Galaxy
Re: Synchronizing Solr with a PostgreDB
We're doing what was recommended. Nice to hear we're on the right path. Yeah Postgres! Yeah Solr/Lucene! Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Fri, 10/15/10, Juan Manuel Alvarez naici...@gmail.com wrote: From: Juan Manuel Alvarez naici...@gmail.com Subject: Re: Synchronizing Solr with a PostgreDB To: solr-user@lucene.apache.org Date: Friday, October 15, 2010, 1:04 PM Thanks for the quick response! =o) We will go with that approach. On Thu, Oct 14, 2010 at 7:19 PM, Allistair Crossley a...@roxxor.co.uk wrote: i would not cross-reference solr results with your database to merge unless you want to spank your database. nor would i load solr with all your data. what i have found is that the search results page is generally a small subset of data relating to the fuller document/result. therefore i store only the data required to present the search results wholly from solr. the user can choose to click into a specific result which then uses just the database to present it. use data import handler - define an xml config to import as many entities into your document as you need and map columns to fields in schema.xml. use the Wiki page on DIH - it's all there, as well as example config in the solr distro. allistair On Oct 14, 2010, at 6:13 PM, Juan Manuel Alvarez wrote: Hello everyone! I am new to Solr and Lucene and I would like to ask you a couple of questions. I am working on an existing system that has the data saved in a Postgre DB and now I am trying to integrate Solr to use full-text search and faceted search, but I am having a couple of doubts about it. 1) I see two ways of storing the data and make the search: - Duplicate all the DB data in Solr, so complete results are returned from a search query, or... - Put in Solr just the data that I need to search and, after finding the elements with a Solr query, use the result to make a more specific query to the DB. Which is the way this is normally done? 2) How do I synchronize Solr and Postgre? Do I have to use the DataImportHandler or when I do the INSERT command into Postgre, I have to execute a command into Solr? Thanks for your time! Cheers! Juan M.
Re: Virtual field, Statistics
Please add a JIRA issue requesting this. A bunch of things are not supported for functions: returning as a field value, for example. On Thu, Oct 14, 2010 at 8:31 AM, Tanguy Moal tanguy.m...@gmail.com wrote: Dear solr-user folks, I would like to use the stats module to perform very basic statistics (mean, min and max) which is actually working just fine. Nethertheless I found a little limitation that bothers me a tiny bit : how to perform the exact same statistics, but on the result of a function query rather than a field. Example : schema : - string : id - float : width - float : height - float : depth - string : color - float : price What I'd like to do is something like : select?price:[45.5 TO 99.99]stats=onstats.facet=colorstats.field={volume=product(product(width, height), depth)} I would expect to obtain : lst name=stats lst name=stats_fields lst name=(product(product(width,height),depth)) double name=min.../double double name=max.../double double name=sum.../double long name=count.../long long name=missing.../long double name=sumOfSquares.../double double name=mean.../double double name=stddev.../double lst name=facets lst name=color lst name=white double name=min.../double double name=max.../double double name=sum.../double long name=count.../long long name=missing.../long double name=sumOfSquares.../double double name=mean.../double double name=stddev.../double /lst lst name=red double name=min.../double double name=max.../double double name=sum.../double long name=count.../long long name=missing.../long double name=sumOfSquares.../double double name=mean.../double double name=stddev.../double /lst !-- Other facets on other colors go here -- /lst!-- end of statistical facets on volumes -- /lst!-- end of stats on volumes -- /lst!-- end of stats_fields node -- /lst Of course computing the volume can be performed before indexing data, but defining virtual fields on the fly given an arbitrary function is powerful and I am comfortable with the idea that many others would appreciate. Especially for BI needs and so on... :-D Is there a way to do it easily that I would have not been able to find, or is it actually impossible ? Thank you very much in advance for your help. -- Tanguy -- Lance Norskog goks...@gmail.com