Re: Creating Solr index from map/reduce
Thanks Alexander 2011/1/3 Alexander Kanarsky kanarsky2...@gmail.com Joan, current version of the patch assumes the location and names for the schema and solrconfig files ($SOLR_HOME/conf), it is hardcoded (see the SolrRecordWriter's constructor). Multi-core configuration with separate configuration locations via solr.xml is not supported as for now. As a workaround, you could link or copy the schema and solrconfig files to follow the hardcoded assumption. Thanks, -Alexander On Wed, Dec 29, 2010 at 2:50 AM, Joan joan.monp...@gmail.com wrote: If I rename my custom schema file (schema-xx.xml), whitch is located in SOLR_HOME/schema/, and then I copy it to conf folder and finally I try to run CSVIndexer, it shows me an other error: Caused by: java.lang.RuntimeException: Can't find resource 'solrconfig.xml' in classpath or '/tmp/hadoop-root/mapred/local/taskTracker/archive/localhost/tmp/b7611d6d-9cc7-4237-a240-96ecaab9f21a.solr.zip/conf/' I dont't understand because I've a solr configuration file (solr.xml) where I define all core: core name=core_name instanceDir=solr-data/index config=solr/conf/solrconfig_xx.xml schema=solr/schema/schema_xx.xml properties=solr/conf/solrcore.properties/ But I think that when I run CSVIndexer, it doesn't know that solr.xml exist, and it try to looking for schema.xml and solrconfig.xml by default in default folder (conf) 2010/12/29 Joan joan.monp...@gmail.com Hi, I'm trying generate Solr index from hadoop (map/reduce) so I'm using this patch SOLR-301 https://issues.apache.org/jira/browse/SOLR-1301, however I don't get it. When I try to run CSVIndexer with some arguments: directory Solr index -solr Solr home input, in this case CSV I'm runnig CSVIndexer: HADOOP_INSTALL/bin/hadoop jar my.jar CSVIndexer INDEX_FOLDER -solr /SOLR_HOME CSV FILE PATH Before that I run CSVIndexer, I've put csv file into HDFS. My Solr home hasn't default files configurations, but which is divided into multiple folders /conf /schema I have custom solr file configurations so CSVIndexer can't find schema.xml, obviously It won't be able to find it because this file doesn't exist, in my case, this file is named schema-xx.xml and CSVIndexer is looking for it inside conf folder and It don't know that schema folder exist. And I have solr configuration file (solr.xml) where I configure multiple cores. I tried to modify solr's paths but It still not working . I understand that CSVIndexer copy Solr Home specified into HDFS (/tmp/hadoop-user/mapred/local/taskTracker/archive/...) and when It try to find schema.xml it doesn't exit: 10/12/29 10:18:11 INFO mapred.JobClient: Task Id : attempt_201012291016_0002_r_00_1, Status : FAILED java.lang.IllegalStateException: Failed to initialize record writer for my.jar, attempt_201012291016_0002_r_00_1 at org.apache.solr.hadoop.SolrRecordWriter.init(SolrRecordWriter.java:253) at org.apache.solr.hadoop.SolrOutputFormat.getRecordWriter(SolrOutputFormat.java:152) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:553) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) at org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: java.io.FileNotFoundException: Source '/tmp/hadoop-guest/mapred/local/taskTracker/archive/localhost/tmp/e8be5bb1-e910-47a1-b5a7-1352dfec2b1f.solr.zip/conf/schema.xml' does not exist at org.apache.commons.io.FileUtils.copyFile(FileUtils.java:636) at org.apache.commons.io.FileUtils.copyFile(FileUtils.java:606) at org.apache.solr.hadoop.SolrRecordWriter.init(SolrRecordWriter.java:222) ... 4 more
Re: Creating Solr index from map/reduce
Joan, current version of the patch assumes the location and names for the schema and solrconfig files ($SOLR_HOME/conf), it is hardcoded (see the SolrRecordWriter's constructor). Multi-core configuration with separate configuration locations via solr.xml is not supported as for now. As a workaround, you could link or copy the schema and solrconfig files to follow the hardcoded assumption. Thanks, -Alexander On Wed, Dec 29, 2010 at 2:50 AM, Joan joan.monp...@gmail.com wrote: If I rename my custom schema file (schema-xx.xml), whitch is located in SOLR_HOME/schema/, and then I copy it to conf folder and finally I try to run CSVIndexer, it shows me an other error: Caused by: java.lang.RuntimeException: Can't find resource 'solrconfig.xml' in classpath or '/tmp/hadoop-root/mapred/local/taskTracker/archive/localhost/tmp/b7611d6d-9cc7-4237-a240-96ecaab9f21a.solr.zip/conf/' I dont't understand because I've a solr configuration file (solr.xml) where I define all core: core name=core_name instanceDir=solr-data/index config=solr/conf/solrconfig_xx.xml schema=solr/schema/schema_xx.xml properties=solr/conf/solrcore.properties/ But I think that when I run CSVIndexer, it doesn't know that solr.xml exist, and it try to looking for schema.xml and solrconfig.xml by default in default folder (conf) 2010/12/29 Joan joan.monp...@gmail.com Hi, I'm trying generate Solr index from hadoop (map/reduce) so I'm using this patch SOLR-301 https://issues.apache.org/jira/browse/SOLR-1301, however I don't get it. When I try to run CSVIndexer with some arguments: directory Solr index -solr Solr home input, in this case CSV I'm runnig CSVIndexer: HADOOP_INSTALL/bin/hadoop jar my.jar CSVIndexer INDEX_FOLDER -solr /SOLR_HOME CSV FILE PATH Before that I run CSVIndexer, I've put csv file into HDFS. My Solr home hasn't default files configurations, but which is divided into multiple folders /conf /schema I have custom solr file configurations so CSVIndexer can't find schema.xml, obviously It won't be able to find it because this file doesn't exist, in my case, this file is named schema-xx.xml and CSVIndexer is looking for it inside conf folder and It don't know that schema folder exist. And I have solr configuration file (solr.xml) where I configure multiple cores. I tried to modify solr's paths but It still not working . I understand that CSVIndexer copy Solr Home specified into HDFS (/tmp/hadoop-user/mapred/local/taskTracker/archive/...) and when It try to find schema.xml it doesn't exit: 10/12/29 10:18:11 INFO mapred.JobClient: Task Id : attempt_201012291016_0002_r_00_1, Status : FAILED java.lang.IllegalStateException: Failed to initialize record writer for my.jar, attempt_201012291016_0002_r_00_1 at org.apache.solr.hadoop.SolrRecordWriter.init(SolrRecordWriter.java:253) at org.apache.solr.hadoop.SolrOutputFormat.getRecordWriter(SolrOutputFormat.java:152) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:553) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) at org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: java.io.FileNotFoundException: Source '/tmp/hadoop-guest/mapred/local/taskTracker/archive/localhost/tmp/e8be5bb1-e910-47a1-b5a7-1352dfec2b1f.solr.zip/conf/schema.xml' does not exist at org.apache.commons.io.FileUtils.copyFile(FileUtils.java:636) at org.apache.commons.io.FileUtils.copyFile(FileUtils.java:606) at org.apache.solr.hadoop.SolrRecordWriter.init(SolrRecordWriter.java:222) ... 4 more
Creating Solr index from map/reduce
Hi, I'm trying generate Solr index from hadoop (map/reduce) so I'm using this patch SOLR-301 https://issues.apache.org/jira/browse/SOLR-1301, however I don't get it. When I try to run CSVIndexer with some arguments: directory Solr index -solr Solr home input, in this case CSV I'm runnig CSVIndexer: HADOOP_INSTALL/bin/hadoop jar my.jar CSVIndexer INDEX_FOLDER -solr /SOLR_HOME CSV FILE PATH Before that I run CSVIndexer, I've put csv file into HDFS. My Solr home hasn't default files configurations, but which is divided into multiple folders /conf /schema I have custom solr file configurations so CSVIndexer can't find schema.xml, obviously It won't be able to find it because this file doesn't exist, in my case, this file is named schema-xx.xml and CSVIndexer is looking for it inside conf folder and It don't know that schema folder exist. And I have solr configuration file (solr.xml) where I configure multiple cores. I tried to modify solr's paths but It still not working . I understand that CSVIndexer copy Solr Home specified into HDFS (/tmp/hadoop-user/mapred/local/taskTracker/archive/...) and when It try to find schema.xml it doesn't exit: 10/12/29 10:18:11 INFO mapred.JobClient: Task Id : attempt_201012291016_0002_r_00_1, Status : FAILED java.lang.IllegalStateException: Failed to initialize record writer for my.jar, attempt_201012291016_0002_r_00_1 at org.apache.solr.hadoop.SolrRecordWriter.init(SolrRecordWriter.java:253) at org.apache.solr.hadoop.SolrOutputFormat.getRecordWriter(SolrOutputFormat.java:152) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:553) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) at org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: java.io.FileNotFoundException: Source '/tmp/hadoop-guest/mapred/local/taskTracker/archive/localhost/tmp/e8be5bb1-e910-47a1-b5a7-1352dfec2b1f.solr.zip/conf/schema.xml' does not exist at org.apache.commons.io.FileUtils.copyFile(FileUtils.java:636) at org.apache.commons.io.FileUtils.copyFile(FileUtils.java:606) at org.apache.solr.hadoop.SolrRecordWriter.init(SolrRecordWriter.java:222) ... 4 more
Re: Creating Solr index from map/reduce
If I rename my custom schema file (schema-xx.xml), whitch is located in SOLR_HOME/schema/, and then I copy it to conf folder and finally I try to run CSVIndexer, it shows me an other error: Caused by: java.lang.RuntimeException: Can't find resource 'solrconfig.xml' in classpath or '/tmp/hadoop-root/mapred/local/taskTracker/archive/localhost/tmp/b7611d6d-9cc7-4237-a240-96ecaab9f21a.solr.zip/conf/' I dont't understand because I've a solr configuration file (solr.xml) where I define all core: core name=core_name instanceDir=solr-data/index config=solr/conf/solrconfig_xx.xml schema=solr/schema/schema_xx.xml properties=solr/conf/solrcore.properties/ But I think that when I run CSVIndexer, it doesn't know that solr.xml exist, and it try to looking for schema.xml and solrconfig.xml by default in default folder (conf) 2010/12/29 Joan joan.monp...@gmail.com Hi, I'm trying generate Solr index from hadoop (map/reduce) so I'm using this patch SOLR-301 https://issues.apache.org/jira/browse/SOLR-1301, however I don't get it. When I try to run CSVIndexer with some arguments: directory Solr index -solr Solr home input, in this case CSV I'm runnig CSVIndexer: HADOOP_INSTALL/bin/hadoop jar my.jar CSVIndexer INDEX_FOLDER -solr /SOLR_HOME CSV FILE PATH Before that I run CSVIndexer, I've put csv file into HDFS. My Solr home hasn't default files configurations, but which is divided into multiple folders /conf /schema I have custom solr file configurations so CSVIndexer can't find schema.xml, obviously It won't be able to find it because this file doesn't exist, in my case, this file is named schema-xx.xml and CSVIndexer is looking for it inside conf folder and It don't know that schema folder exist. And I have solr configuration file (solr.xml) where I configure multiple cores. I tried to modify solr's paths but It still not working . I understand that CSVIndexer copy Solr Home specified into HDFS (/tmp/hadoop-user/mapred/local/taskTracker/archive/...) and when It try to find schema.xml it doesn't exit: 10/12/29 10:18:11 INFO mapred.JobClient: Task Id : attempt_201012291016_0002_r_00_1, Status : FAILED java.lang.IllegalStateException: Failed to initialize record writer for my.jar, attempt_201012291016_0002_r_00_1 at org.apache.solr.hadoop.SolrRecordWriter.init(SolrRecordWriter.java:253) at org.apache.solr.hadoop.SolrOutputFormat.getRecordWriter(SolrOutputFormat.java:152) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:553) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) at org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: java.io.FileNotFoundException: Source '/tmp/hadoop-guest/mapred/local/taskTracker/archive/localhost/tmp/e8be5bb1-e910-47a1-b5a7-1352dfec2b1f.solr.zip/conf/schema.xml' does not exist at org.apache.commons.io.FileUtils.copyFile(FileUtils.java:636) at org.apache.commons.io.FileUtils.copyFile(FileUtils.java:606) at org.apache.solr.hadoop.SolrRecordWriter.init(SolrRecordWriter.java:222) ... 4 more