Re: Attempting dataimport using FileListEntityProcessor
I do want to import all documents. My understanding of the way things work, correct me if I'm wrong, is that there can be a certain number of documents included in a single atomic update. Instead of having all my 16 Million documents be part of a single update (that could more easily fail being so big), I was thinking that it would be better to be able to stipulate how many docs are part of an update and my 16 Million doc import would consist of 16M/100 updates. Shalin Shekhar Mangar wrote: Hi Mike, Just curious to know the use-case here. Why do you want to limit updates to 100 instead of importing all documents? On Tue, Jun 24, 2008 at 10:23 AM, mike segv [EMAIL PROTECTED] wrote: That fixed it. If I'm inserting millions of documents, how do I control docs/update? E.g. if there are 50K docs per file, I'm thinking that I should probably code up my own DataSource that allows me to stipulate docs/update. Like say, 100 instead of 50K. Does this make sense? Mike Noble Paul നോബിള് नोब्ळ् wrote: hi , You have not registered any datasources . the second entity needs a datasource. Remove the dataSource=null and add a name for the second entity (good practice). No need for baseDir attribute for second entity . See the modified xml added below --Noble dataConfig dataSource type=FileDataSource/ document entity name=f processor=FileListEntityProcessor fileName=.*xml newerThan='NOW-10DAYS' recursive=true rootEntity=false dataSource=null baseDir=/san/tomcat-services/solr-medline entity name=x processor=XPathEntityProcessor forEach=/MedlineCitation url=${f.fileAbsolutePath} field column=pmid xpath=/MedlineCitation/PMID/ /entity /entity /document /dataConfig On Tue, Jun 24, 2008 at 6:39 AM, mike segv [EMAIL PROTECTED] wrote: I'm trying to use the fileListEntityProcessor to add some xml documents to a solr index. I'm running a nightly version of solr-1.3 with SOLR-469 and SOLR-563. I've been able to successfuly run the slashdot httpDataSource example. My data-config.xml file loads without errors. When I attempt the full-import command I get the exception below. Thanks for any help. Mike WARNING: No lockType configured for /san/tomcat-services/solr-medline/solr/data/index/ assuming 'simple' Jun 23, 2008 7:59:49 PM org.apache.solr.handler.dataimport.DataImporter doFullImport SEVERE: Full Import failed java.lang.RuntimeException: java.lang.NullPointerException at org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:97) at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:212) at org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:166) at org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:149) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:286) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:312) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:179) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:140) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:335) at org.apache.solr.handler.dataimport.DataImporter.rumCmd(DataImporter.java:386) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377) Caused by: java.lang.NullPointerException at java.io.Reader.init(Reader.java:61) at java.io.BufferedReader.init(BufferedReader.java:76) at com.bea.xml.stream.MXParser.checkForXMLDecl(MXParser.java:775) at com.bea.xml.stream.MXParser.setInput(MXParser.java:806) at com.bea.xml.stream.MXParserFactory.createXMLStreamReader(MXParserFactory.java:261) at org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:93) ... 10 more Here is my data-config: dataConfig document entity name=f processor=FileListEntityProcessor fileName=.*xml newerThan='NOW-10DAYS' recursive=true rootEntity=false dataSource=null baseDi r=/san/tomcat-services/solr-medline entity processor=XPathEntityProcessor forEach=/MedlineCitation url=${f.fileAbsolutePath} dataSource=null field column=pmid xpath=/MedlineCitation/PMID/ /entity /entity /document /dataConfig And a snippet from an xml file: MedlineCitation Owner=PIP Status=MEDLINE PMID12236137/PMID DateCreated Year1980/Year Month01/Month Day03/Day /DateCreated -- View this message in context: http://www.nabble.com/Attempting-dataimport-using-FileListEntityProcessor-tp18081671p18081671.html Sent from the Solr - User mailing list archive at Nabble.com
Attempting dataimport using FileListEntityProcessor
I'm trying to use the fileListEntityProcessor to add some xml documents to a solr index. I'm running a nightly version of solr-1.3 with SOLR-469 and SOLR-563. I've been able to successfuly run the slashdot httpDataSource example. My data-config.xml file loads without errors. When I attempt the full-import command I get the exception below. Thanks for any help. Mike WARNING: No lockType configured for /san/tomcat-services/solr-medline/solr/data/index/ assuming 'simple' Jun 23, 2008 7:59:49 PM org.apache.solr.handler.dataimport.DataImporter doFullImport SEVERE: Full Import failed java.lang.RuntimeException: java.lang.NullPointerException at org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:97) at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:212) at org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:166) at org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:149) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:286) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:312) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:179) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:140) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:335) at org.apache.solr.handler.dataimport.DataImporter.rumCmd(DataImporter.java:386) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377) Caused by: java.lang.NullPointerException at java.io.Reader.init(Reader.java:61) at java.io.BufferedReader.init(BufferedReader.java:76) at com.bea.xml.stream.MXParser.checkForXMLDecl(MXParser.java:775) at com.bea.xml.stream.MXParser.setInput(MXParser.java:806) at com.bea.xml.stream.MXParserFactory.createXMLStreamReader(MXParserFactory.java:261) at org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:93) ... 10 more Here is my data-config: dataConfig document entity name=f processor=FileListEntityProcessor fileName=.*xml newerThan='NOW-10DAYS' recursive=true rootEntity=false dataSource=null baseDi r=/san/tomcat-services/solr-medline entity processor=XPathEntityProcessor forEach=/MedlineCitation url=${f.fileAbsolutePath} dataSource=null field column=pmid xpath=/MedlineCitation/PMID/ /entity /entity /document /dataConfig And a snippet from an xml file: MedlineCitation Owner=PIP Status=MEDLINE PMID12236137/PMID DateCreated Year1980/Year Month01/Month Day03/Day /DateCreated -- View this message in context: http://www.nabble.com/Attempting-dataimport-using-FileListEntityProcessor-tp18081671p18081671.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: [Update] Solr can be started from jetty but not tomcat
The error messages due to this problem are very misleading. After a lot of trial and error I got solr to work with tomcat by adding the xalan.jar to the libs directory and rebuilding the war file. Vinci wrote: Hi all, after several hour I make the solr works a little bit: the jetty version works, but the tomcat version doesn't. Enviroment: JRE 1.6, tomcat 5.5, ubuntu 7.10. Solr nightly (8 Mar 08) Look like the multicore.xml cause the problem...the Solr die at the time of Config? In the localhost log: org.apache.catalina.core.StandardContext filterStart SEVERE: Exception starting filter SolrRequestFilter java.lang.NoClassDefFoundError: Could not initialize class org.apache.solr.core.SolrConfig at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:114) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:221) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:302) at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:78) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3635) at org.apache.catalina.core.StandardContext.start(StandardContext.java:4222) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:760) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:740) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:544) at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:626) at org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:553) at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:488) at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1138) at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:311) at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:120) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1022) at org.apache.catalina.core.StandardHost.start(StandardHost.java:736) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1014) at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443) at org.apache.catalina.core.StandardService.start(StandardService.java:448) at org.apache.catalina.core.StandardServer.start(StandardServer.java:700) at org.apache.catalina.startup.Catalina.start(Catalina.java:552) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:295) at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:433) Catalina log: org.apache.solr.servlet.SolrDispatchFilter init INFO: SolrDispatchFilter.init() org.apache.solr.core.SolrResourceLoader locateInstanceDir INFO: Using JNDI solr.home: /var/webapps/solr org.apache.solr.servlet.SolrDispatchFilter init INFO: looking for multicore.xml: /var/webapps/solr/multicore.xml org.apache.solr.servlet.SolrDispatchFilter init SEVERE: Could not start SOLR. Check solr/home property java.lang.ExceptionInInitializerError at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:104) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:221) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:302) at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:78) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3635) at org.apache.catalina.core.StandardContext.start(StandardContext.java:4222) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:760) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:740) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:544) at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:626) at org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:553) at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:488) at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1138) at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:311) at