Re: Creating Solr index from map/reduce

2011-01-10 Thread Joan
Thanks Alexander

2011/1/3 Alexander Kanarsky kanarsky2...@gmail.com

 Joan,

 current version of the patch assumes the location and names for the
 schema and solrconfig files ($SOLR_HOME/conf), it is hardcoded (see
 the SolrRecordWriter's constructor). Multi-core configuration with
 separate configuration locations via solr.xml is not supported as for
 now.  As a workaround, you could link or copy the schema and
 solrconfig files to follow the hardcoded assumption.

 Thanks,
 -Alexander

 On Wed, Dec 29, 2010 at 2:50 AM, Joan joan.monp...@gmail.com wrote:
  If I rename my custom schema file (schema-xx.xml), whitch is located in
  SOLR_HOME/schema/, and then I copy it to conf folder and finally I try
 to
  run CSVIndexer, it shows me an other error:
 
  Caused by: java.lang.RuntimeException: Can't find resource
 'solrconfig.xml'
  in classpath or
 
 '/tmp/hadoop-root/mapred/local/taskTracker/archive/localhost/tmp/b7611d6d-9cc7-4237-a240-96ecaab9f21a.solr.zip/conf/'
 
  I dont't understand because I've a solr configuration file (solr.xml)
 where
  I define all core:
 
   core name=core_name
 instanceDir=solr-data/index
 config=solr/conf/solrconfig_xx.xml
 schema=solr/schema/schema_xx.xml
 properties=solr/conf/solrcore.properties/ 
 
  But I think that when I run CSVIndexer, it doesn't know that solr.xml
 exist,
  and it try to looking for schema.xml and solrconfig.xml by default in
  default folder (conf)
 
 
 
  2010/12/29 Joan joan.monp...@gmail.com
 
  Hi,
 
  I'm trying generate Solr index from hadoop (map/reduce) so I'm using
 this
  patch SOLR-301 https://issues.apache.org/jira/browse/SOLR-1301,
 however
  I don't get it.
 
  When I try to run CSVIndexer with some arguments: directory Solr index
  -solr Solr home input, in this case CSV
 
  I'm runnig CSVIndexer:
 
  HADOOP_INSTALL/bin/hadoop jar my.jar CSVIndexer INDEX_FOLDER -solr
  /SOLR_HOME CSV FILE PATH
 
  Before that I run CSVIndexer, I've put csv file into HDFS.
 
  My Solr home hasn't default files configurations, but which is divided
  into multiple folders
 
  /conf
  /schema
 
  I have custom solr file configurations so CSVIndexer can't find
 schema.xml,
  obviously It won't be able to find it because this file doesn't exist,
 in my
  case, this file is named schema-xx.xml and CSVIndexer is looking for
 it
  inside conf folder and It don't know that schema folder exist. And I
 have
  solr configuration file (solr.xml) where I configure multiple cores.
 
  I tried to modify solr's paths but It still not working .
 
  I understand that CSVIndexer copy Solr Home specified into HDFS
  (/tmp/hadoop-user/mapred/local/taskTracker/archive/...) and when It try
 to
  find schema.xml it doesn't exit:
 
  10/12/29 10:18:11 INFO mapred.JobClient: Task Id :
  attempt_201012291016_0002_r_00_1, Status : FAILED
  java.lang.IllegalStateException: Failed to initialize record writer for
  my.jar, attempt_201012291016_0002_r_00_1
  at
 
 org.apache.solr.hadoop.SolrRecordWriter.init(SolrRecordWriter.java:253)
  at
 
 org.apache.solr.hadoop.SolrOutputFormat.getRecordWriter(SolrOutputFormat.java:152)
  at
  org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:553)
  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
  at org.apache.hadoop.mapred.Child.main(Child.java:170)
  Caused by: java.io.FileNotFoundException: Source
 
 '/tmp/hadoop-guest/mapred/local/taskTracker/archive/localhost/tmp/e8be5bb1-e910-47a1-b5a7-1352dfec2b1f.solr.zip/conf/schema.xml'
  does not exist
  at org.apache.commons.io.FileUtils.copyFile(FileUtils.java:636)
  at org.apache.commons.io.FileUtils.copyFile(FileUtils.java:606)
  at
 
 org.apache.solr.hadoop.SolrRecordWriter.init(SolrRecordWriter.java:222)
  ... 4 more
 



Re: Creating Solr index from map/reduce

2011-01-03 Thread Alexander Kanarsky
Joan,

current version of the patch assumes the location and names for the
schema and solrconfig files ($SOLR_HOME/conf), it is hardcoded (see
the SolrRecordWriter's constructor). Multi-core configuration with
separate configuration locations via solr.xml is not supported as for
now.  As a workaround, you could link or copy the schema and
solrconfig files to follow the hardcoded assumption.

Thanks,
-Alexander

On Wed, Dec 29, 2010 at 2:50 AM, Joan joan.monp...@gmail.com wrote:
 If I rename my custom schema file (schema-xx.xml), whitch is located in
 SOLR_HOME/schema/, and then I copy it to conf folder and finally I try to
 run CSVIndexer, it shows me an other error:

 Caused by: java.lang.RuntimeException: Can't find resource 'solrconfig.xml'
 in classpath or
 '/tmp/hadoop-root/mapred/local/taskTracker/archive/localhost/tmp/b7611d6d-9cc7-4237-a240-96ecaab9f21a.solr.zip/conf/'

 I dont't understand because I've a solr configuration file (solr.xml) where
 I define all core:

  core name=core_name
        instanceDir=solr-data/index
        config=solr/conf/solrconfig_xx.xml
        schema=solr/schema/schema_xx.xml
        properties=solr/conf/solrcore.properties/ 

 But I think that when I run CSVIndexer, it doesn't know that solr.xml exist,
 and it try to looking for schema.xml and solrconfig.xml by default in
 default folder (conf)



 2010/12/29 Joan joan.monp...@gmail.com

 Hi,

 I'm trying generate Solr index from hadoop (map/reduce) so I'm using this
 patch SOLR-301 https://issues.apache.org/jira/browse/SOLR-1301, however
 I don't get it.

 When I try to run CSVIndexer with some arguments: directory Solr index
 -solr Solr home input, in this case CSV

 I'm runnig CSVIndexer:

 HADOOP_INSTALL/bin/hadoop jar my.jar CSVIndexer INDEX_FOLDER -solr
 /SOLR_HOME CSV FILE PATH

 Before that I run CSVIndexer, I've put csv file into HDFS.

 My Solr home hasn't default files configurations, but which is divided
 into multiple folders

 /conf
 /schema

 I have custom solr file configurations so CSVIndexer can't find schema.xml,
 obviously It won't be able to find it because this file doesn't exist, in my
 case, this file is named schema-xx.xml and CSVIndexer is looking for it
 inside conf folder and It don't know that schema folder exist. And I have
 solr configuration file (solr.xml) where I configure multiple cores.

 I tried to modify solr's paths but It still not working .

 I understand that CSVIndexer copy Solr Home specified into HDFS
 (/tmp/hadoop-user/mapred/local/taskTracker/archive/...) and when It try to
 find schema.xml it doesn't exit:

 10/12/29 10:18:11 INFO mapred.JobClient: Task Id :
 attempt_201012291016_0002_r_00_1, Status : FAILED
 java.lang.IllegalStateException: Failed to initialize record writer for
 my.jar, attempt_201012291016_0002_r_00_1
         at
 org.apache.solr.hadoop.SolrRecordWriter.init(SolrRecordWriter.java:253)
         at
 org.apache.solr.hadoop.SolrOutputFormat.getRecordWriter(SolrOutputFormat.java:152)
         at
 org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:553)
         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
         at org.apache.hadoop.mapred.Child.main(Child.java:170)
 Caused by: java.io.FileNotFoundException: Source
 '/tmp/hadoop-guest/mapred/local/taskTracker/archive/localhost/tmp/e8be5bb1-e910-47a1-b5a7-1352dfec2b1f.solr.zip/conf/schema.xml'
 does not exist
         at org.apache.commons.io.FileUtils.copyFile(FileUtils.java:636)
         at org.apache.commons.io.FileUtils.copyFile(FileUtils.java:606)
         at
 org.apache.solr.hadoop.SolrRecordWriter.init(SolrRecordWriter.java:222)
         ... 4 more



Creating Solr index from map/reduce

2010-12-29 Thread Joan
Hi,

I'm trying generate Solr index from hadoop (map/reduce) so I'm using this
patch SOLR-301 https://issues.apache.org/jira/browse/SOLR-1301, however I
don't get it.

When I try to run CSVIndexer with some arguments: directory Solr index
-solr Solr home input, in this case CSV

I'm runnig CSVIndexer:

HADOOP_INSTALL/bin/hadoop jar my.jar CSVIndexer INDEX_FOLDER -solr
/SOLR_HOME CSV FILE PATH

Before that I run CSVIndexer, I've put csv file into HDFS.

My Solr home hasn't default files configurations, but which is divided  into
multiple folders

/conf
/schema

I have custom solr file configurations so CSVIndexer can't find schema.xml,
obviously It won't be able to find it because this file doesn't exist, in my
case, this file is named schema-xx.xml and CSVIndexer is looking for it
inside conf folder and It don't know that schema folder exist. And I have
solr configuration file (solr.xml) where I configure multiple cores.

I tried to modify solr's paths but It still not working .

I understand that CSVIndexer copy Solr Home specified into HDFS
(/tmp/hadoop-user/mapred/local/taskTracker/archive/...) and when It try to
find schema.xml it doesn't exit:

10/12/29 10:18:11 INFO mapred.JobClient: Task Id :
attempt_201012291016_0002_r_00_1, Status : FAILED
java.lang.IllegalStateException: Failed to initialize record writer for
my.jar, attempt_201012291016_0002_r_00_1
at
org.apache.solr.hadoop.SolrRecordWriter.init(SolrRecordWriter.java:253)
at
org.apache.solr.hadoop.SolrOutputFormat.getRecordWriter(SolrOutputFormat.java:152)
at
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:553)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.io.FileNotFoundException: Source
'/tmp/hadoop-guest/mapred/local/taskTracker/archive/localhost/tmp/e8be5bb1-e910-47a1-b5a7-1352dfec2b1f.solr.zip/conf/schema.xml'
does not exist
at org.apache.commons.io.FileUtils.copyFile(FileUtils.java:636)
at org.apache.commons.io.FileUtils.copyFile(FileUtils.java:606)
at
org.apache.solr.hadoop.SolrRecordWriter.init(SolrRecordWriter.java:222)
... 4 more


Re: Creating Solr index from map/reduce

2010-12-29 Thread Joan
If I rename my custom schema file (schema-xx.xml), whitch is located in
SOLR_HOME/schema/, and then I copy it to conf folder and finally I try to
run CSVIndexer, it shows me an other error:

Caused by: java.lang.RuntimeException: Can't find resource 'solrconfig.xml'
in classpath or
'/tmp/hadoop-root/mapred/local/taskTracker/archive/localhost/tmp/b7611d6d-9cc7-4237-a240-96ecaab9f21a.solr.zip/conf/'

I dont't understand because I've a solr configuration file (solr.xml) where
I define all core:

  core name=core_name
instanceDir=solr-data/index
config=solr/conf/solrconfig_xx.xml
schema=solr/schema/schema_xx.xml
properties=solr/conf/solrcore.properties/ 

But I think that when I run CSVIndexer, it doesn't know that solr.xml exist,
and it try to looking for schema.xml and solrconfig.xml by default in
default folder (conf)



2010/12/29 Joan joan.monp...@gmail.com

 Hi,

 I'm trying generate Solr index from hadoop (map/reduce) so I'm using this
 patch SOLR-301 https://issues.apache.org/jira/browse/SOLR-1301, however
 I don't get it.

 When I try to run CSVIndexer with some arguments: directory Solr index
 -solr Solr home input, in this case CSV

 I'm runnig CSVIndexer:

 HADOOP_INSTALL/bin/hadoop jar my.jar CSVIndexer INDEX_FOLDER -solr
 /SOLR_HOME CSV FILE PATH

 Before that I run CSVIndexer, I've put csv file into HDFS.

 My Solr home hasn't default files configurations, but which is divided
 into multiple folders

 /conf
 /schema

 I have custom solr file configurations so CSVIndexer can't find schema.xml,
 obviously It won't be able to find it because this file doesn't exist, in my
 case, this file is named schema-xx.xml and CSVIndexer is looking for it
 inside conf folder and It don't know that schema folder exist. And I have
 solr configuration file (solr.xml) where I configure multiple cores.

 I tried to modify solr's paths but It still not working .

 I understand that CSVIndexer copy Solr Home specified into HDFS
 (/tmp/hadoop-user/mapred/local/taskTracker/archive/...) and when It try to
 find schema.xml it doesn't exit:

 10/12/29 10:18:11 INFO mapred.JobClient: Task Id :
 attempt_201012291016_0002_r_00_1, Status : FAILED
 java.lang.IllegalStateException: Failed to initialize record writer for
 my.jar, attempt_201012291016_0002_r_00_1
 at
 org.apache.solr.hadoop.SolrRecordWriter.init(SolrRecordWriter.java:253)
 at
 org.apache.solr.hadoop.SolrOutputFormat.getRecordWriter(SolrOutputFormat.java:152)
 at
 org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:553)
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
 at org.apache.hadoop.mapred.Child.main(Child.java:170)
 Caused by: java.io.FileNotFoundException: Source
 '/tmp/hadoop-guest/mapred/local/taskTracker/archive/localhost/tmp/e8be5bb1-e910-47a1-b5a7-1352dfec2b1f.solr.zip/conf/schema.xml'
 does not exist
 at org.apache.commons.io.FileUtils.copyFile(FileUtils.java:636)
 at org.apache.commons.io.FileUtils.copyFile(FileUtils.java:606)
 at
 org.apache.solr.hadoop.SolrRecordWriter.init(SolrRecordWriter.java:222)
 ... 4 more