[ https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920402#action_12920402 ]
Dhruv Bansal commented on SOLR-1301: ------------------------------------ I am unable to compile SOLR 1.4.1 after patching with the latest (2010-09-20 04:40 AM) SOLR-1301.patch. {code:borderStyle=solid} $ wget http://mirror.cloudera.com/apache//lucene/solr/1.4.1/apache-solr-1.4.1.tgz ... $ tar -xzf apache-solr-1.4.1.tgz $ cd apache-solr-1.4.1/contrib apache-solr-1.4.1/contrib$ wget https://issues.apache.org/jira/secure/attachment/12455023/SOLR-1301.patch apache-solr-1.4.1/contrib$ patch -p2 -i SOLR-1301.patch ... apache-solr-1.4.1/contrib$ mkdir lib apache-solr-1.4.1/contrib$ cd lib apache-solr-1.4.1/contrib/lib$ wget .. # download hadoop, log4j, commons-logging, commons-logging-api jars from top of this page ... apache-solr-1.4.1/contrib/lib$ cd ../.. apache-solr-1.4.1$ ant dist -k ... compile: [javac] Compiling 9 source files to /home/dhruv/projects/infochimps/search/apache-solr-1.4.1/contrib/hadoop/build/classes Target 'compile' failed with message 'The following error occurred while executing this line: /home/dhruv/projects/infochimps/search/apache-solr-1.4.1/common-build.xml:159: Reference lucene.classpath not found.'. Cannot execute 'build' - 'compile' failed or was not executed. Cannot execute 'dist' - 'build' failed or was not executed. [subant] File '/home/dhruv/projects/infochimps/search/apache-solr-1.4.1/contrib/hadoop/build.xml' failed with message 'The following error occurred whil\ e executing this line: [subant] /home/dhruv/projects/infochimps/search/apache-solr-1.4.1/contrib/hadoop/build.xml:65: The following error occurred while executing this line: [subant] /home/dhruv/projects/infochimps/search/apache-solr-1.4.1/common-build.xml:159: Reference lucene.classpath not found.'. .... {code} Am I following the procedure properly? I'm able to build SOLR just fine out of the box as well as after applying [SOLR-1395|https://issues.apache.org/jira/browse/SOLR-1395]. > Solr + Hadoop > ------------- > > Key: SOLR-1301 > URL: https://issues.apache.org/jira/browse/SOLR-1301 > Project: Solr > Issue Type: Improvement > Affects Versions: 1.4 > Reporter: Andrzej Bialecki > Fix For: Next > > Attachments: commons-logging-1.0.4.jar, > commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, > hadoop-0.20.1-core.jar, hadoop.patch, log4j-1.2.15.jar, README.txt, > SOLR-1301-hadoop-0-20.patch, SOLR-1301-hadoop-0-20.patch, SOLR-1301.patch, > SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, > SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SolrRecordWriter.java > > > This patch contains a contrib module that provides distributed indexing > (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is > twofold: > * provide an API that is familiar to Hadoop developers, i.e. that of > OutputFormat > * avoid unnecessary export and (de)serialization of data maintained on HDFS. > SolrOutputFormat consumes data produced by reduce tasks directly, without > storing it in intermediate files. Furthermore, by using an > EmbeddedSolrServer, the indexing task is split into as many parts as there > are reducers, and the data to be indexed is not sent over the network. > Design > ---------- > Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, > which in turn uses SolrRecordWriter to write this data. SolrRecordWriter > instantiates an EmbeddedSolrServer, and it also instantiates an > implementation of SolrDocumentConverter, which is responsible for turning > Hadoop (key, value) into a SolrInputDocument. This data is then added to a > batch, which is periodically submitted to EmbeddedSolrServer. When reduce > task completes, and the OutputFormat is closed, SolrRecordWriter calls > commit() and optimize() on the EmbeddedSolrServer. > The API provides facilities to specify an arbitrary existing solr.home > directory, from which the conf/ and lib/ files will be taken. > This process results in the creation of as many partial Solr home directories > as there were reduce tasks. The output shards are placed in the output > directory on the default filesystem (e.g. HDFS). Such part-NNNNN directories > can be used to run N shard servers. Additionally, users can specify the > number of reduce tasks, in particular 1 reduce task, in which case the output > will consist of a single shard. > An example application is provided that processes large CSV files and uses > this API. It uses a custom CSV processing to avoid (de)serialization overhead. > This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this > issue, you should put it in contrib/hadoop/lib. > Note: the development of this patch was sponsored by an anonymous contributor > and approved for release under Apache License. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org