Re: Attempting dataimport using FileListEntityProcessor

2008-06-24 Thread mike segv

I do want to import all documents.  My understanding of the way things work,
correct me if I'm wrong, is that there can be a certain number of documents
included in a single atomic update.  Instead of having all my 16 Million
documents be part of a single update (that could more easily fail being so
big), I was thinking that it would be better to be able to stipulate how
many docs are part of an update and my 16 Million doc import would consist
of 16M/100 updates.


Shalin Shekhar Mangar wrote:
 
 Hi Mike,
 
 Just curious to know the use-case here. Why do you want to limit updates
 to
 100 instead of importing all documents?
 
 On Tue, Jun 24, 2008 at 10:23 AM, mike segv [EMAIL PROTECTED] wrote:
 

 That fixed it.

 If I'm inserting millions of documents, how do I control docs/update? 
 E.g.
 if there are 50K docs per file, I'm thinking that I should probably code
 up
 my own DataSource that allows me to stipulate docs/update.  Like say, 100
 instead of 50K.  Does this make sense?

 Mike


 Noble Paul നോബിള്‍ नोब्ळ् wrote:
 
  hi ,
  You have not registered any datasources . the second entity needs a
  datasource.
  Remove the dataSource=null  and add a name for the second entity
  (good practice). No need for baseDir attribute for second entity .
  See the modified xml added below
  --Noble
 
  dataConfig
  dataSource type=FileDataSource/
  document
  entity name=f processor=FileListEntityProcessor fileName=.*xml
  newerThan='NOW-10DAYS' recursive=true rootEntity=false
  dataSource=null  baseDir=/san/tomcat-services/solr-medline
   entity name=x processor=XPathEntityProcessor
  forEach=/MedlineCitation
  url=${f.fileAbsolutePath} 
  field column=pmid xpath=/MedlineCitation/PMID/
   /entity
  /entity
  /document
  /dataConfig
 
  On Tue, Jun 24, 2008 at 6:39 AM, mike segv [EMAIL PROTECTED] wrote:
 
  I'm trying to use the fileListEntityProcessor to add some xml
 documents
  to a
  solr index.  I'm running a nightly version of solr-1.3 with SOLR-469
 and
  SOLR-563.  I've been able to successfuly run the slashdot
 httpDataSource
  example.  My data-config.xml file loads without errors.  When I
 attempt
  the
  full-import command I get the exception below.  Thanks for any help.
 
  Mike
 
  WARNING: No lockType configured for
  /san/tomcat-services/solr-medline/solr/data/index/ assuming 'simple'
  Jun 23, 2008 7:59:49 PM
 org.apache.solr.handler.dataimport.DataImporter
  doFullImport
  SEVERE: Full Import failed
  java.lang.RuntimeException: java.lang.NullPointerException
 at
 
 org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:97)
 at
 
 org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:212)
 at
 
 org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:166)
 at
 
 org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:149)
 at
 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:286)
 at
 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:312)
 at
 
 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:179)
 at
 
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:140)
 at
 
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:335)
 at
 
 org.apache.solr.handler.dataimport.DataImporter.rumCmd(DataImporter.java:386)
 at
 
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377)
  Caused by: java.lang.NullPointerException
 at java.io.Reader.init(Reader.java:61)
 at java.io.BufferedReader.init(BufferedReader.java:76)
 at
 com.bea.xml.stream.MXParser.checkForXMLDecl(MXParser.java:775)
 at com.bea.xml.stream.MXParser.setInput(MXParser.java:806)
 at
 
 com.bea.xml.stream.MXParserFactory.createXMLStreamReader(MXParserFactory.java:261)
 at
 
 org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:93)
 ... 10 more
 
  Here is my data-config:
 
  dataConfig
  document
  entity name=f processor=FileListEntityProcessor fileName=.*xml
  newerThan='NOW-10DAYS' recursive=true rootEntity=false
  dataSource=null baseDi
  r=/san/tomcat-services/solr-medline
   entity processor=XPathEntityProcessor forEach=/MedlineCitation
  url=${f.fileAbsolutePath} dataSource=null
  field column=pmid xpath=/MedlineCitation/PMID/
   /entity
  /entity
  /document
  /dataConfig
 
  And a snippet from an xml file:
  MedlineCitation Owner=PIP Status=MEDLINE
  PMID12236137/PMID
  DateCreated
  Year1980/Year
  Month01/Month
  Day03/Day
  /DateCreated
 
 
  --
  View this message in context:
 
 http://www.nabble.com/Attempting-dataimport-using-FileListEntityProcessor-tp18081671p18081671.html
  Sent from the Solr - User mailing list archive at Nabble.com

Attempting dataimport using FileListEntityProcessor

2008-06-23 Thread mike segv

I'm trying to use the fileListEntityProcessor to add some xml documents to a
solr index.  I'm running a nightly version of solr-1.3 with SOLR-469 and
SOLR-563.  I've been able to successfuly run the slashdot httpDataSource
example.  My data-config.xml file loads without errors.  When I attempt the
full-import command I get the exception below.  Thanks for any help.

Mike

WARNING: No lockType configured for
/san/tomcat-services/solr-medline/solr/data/index/ assuming 'simple'
Jun 23, 2008 7:59:49 PM org.apache.solr.handler.dataimport.DataImporter
doFullImport
SEVERE: Full Import failed
java.lang.RuntimeException: java.lang.NullPointerException
at
org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:97)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:212)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:166)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:149)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:286)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:312)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:179)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:140)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:335)
at
org.apache.solr.handler.dataimport.DataImporter.rumCmd(DataImporter.java:386)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377)
Caused by: java.lang.NullPointerException
at java.io.Reader.init(Reader.java:61)
at java.io.BufferedReader.init(BufferedReader.java:76)
at com.bea.xml.stream.MXParser.checkForXMLDecl(MXParser.java:775)
at com.bea.xml.stream.MXParser.setInput(MXParser.java:806)
at
com.bea.xml.stream.MXParserFactory.createXMLStreamReader(MXParserFactory.java:261)
at
org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:93)
... 10 more

Here is my data-config:

dataConfig
document
entity name=f processor=FileListEntityProcessor fileName=.*xml
newerThan='NOW-10DAYS' recursive=true rootEntity=false
dataSource=null baseDi
r=/san/tomcat-services/solr-medline
  entity processor=XPathEntityProcessor forEach=/MedlineCitation
url=${f.fileAbsolutePath} dataSource=null
 field column=pmid xpath=/MedlineCitation/PMID/
  /entity
/entity
/document
/dataConfig

And a snippet from an xml file:
MedlineCitation Owner=PIP Status=MEDLINE
PMID12236137/PMID
DateCreated
Year1980/Year
Month01/Month
Day03/Day
/DateCreated


-- 
View this message in context: 
http://www.nabble.com/Attempting-dataimport-using-FileListEntityProcessor-tp18081671p18081671.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: [Update] Solr can be started from jetty but not tomcat

2008-06-19 Thread mike segv

The error messages due to this problem are very misleading.  After a lot of
trial and error I got solr to work with tomcat by adding the xalan.jar to
the libs directory and rebuilding the war file.


Vinci wrote:
 
 Hi all, 
 
 after several hour I make the solr works a little bit: the jetty version
 works, but the tomcat version doesn't.
 
 Enviroment: JRE 1.6, tomcat 5.5, ubuntu 7.10. Solr nightly (8 Mar 08)
 
 Look like the multicore.xml cause the problem...the Solr die at the time
 of Config?
 
 In the localhost log:
 org.apache.catalina.core.StandardContext filterStart
 SEVERE: Exception starting filter SolrRequestFilter
 java.lang.NoClassDefFoundError: Could not initialize class
 org.apache.solr.core.SolrConfig
 at
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:114)
 at
 org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:221)
 at
 org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:302)
 at
 org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:78)
 at
 org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3635)
 at
 org.apache.catalina.core.StandardContext.start(StandardContext.java:4222)
 at
 org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:760)
 at
 org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:740)
 at
 org.apache.catalina.core.StandardHost.addChild(StandardHost.java:544)
 at
 org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:626)
 at
 org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:553)
 at
 org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:488)
 at
 org.apache.catalina.startup.HostConfig.start(HostConfig.java:1138)
 at
 org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:311)
 at
 org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:120)
 at
 org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1022)
 at
 org.apache.catalina.core.StandardHost.start(StandardHost.java:736)
 at
 org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1014)
 at
 org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443)
 at
 org.apache.catalina.core.StandardService.start(StandardService.java:448)
 at
 org.apache.catalina.core.StandardServer.start(StandardServer.java:700)
 at org.apache.catalina.startup.Catalina.start(Catalina.java:552)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:295)
 at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:433)
 
 
 
 Catalina log:
  org.apache.solr.servlet.SolrDispatchFilter init
 INFO: SolrDispatchFilter.init()
  org.apache.solr.core.SolrResourceLoader locateInstanceDir
 INFO: Using JNDI solr.home: /var/webapps/solr
  org.apache.solr.servlet.SolrDispatchFilter init
 INFO: looking for multicore.xml: /var/webapps/solr/multicore.xml
  org.apache.solr.servlet.SolrDispatchFilter init
 SEVERE: Could not start SOLR. Check solr/home property
 java.lang.ExceptionInInitializerError
 at
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:104)
 at
 org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:221)
 at
 org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:302)
 at
 org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:78)
 at
 org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3635)
 at
 org.apache.catalina.core.StandardContext.start(StandardContext.java:4222)
 at
 org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:760)
 at
 org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:740)
 at
 org.apache.catalina.core.StandardHost.addChild(StandardHost.java:544)
 at
 org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:626)
 at
 org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:553)
 at
 org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:488)
 at
 org.apache.catalina.startup.HostConfig.start(HostConfig.java:1138)
 at
 org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:311)
 at