[ https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13842555#comment-13842555 ]
Steve Rowe commented on SOLR-1301: ---------------------------------- The Maven Jenkins build on trunk has been failing for a while because {{com.sun.jersey:jersey-bundle:1.8}}, a morphlines-core dependency, causes {{ant validate-maven-dependencies}} to fail - here's a log excerpt from the most recent failure [https://builds.apache.org/job/Lucene-Solr-Maven-trunk/1046/console]: {noformat} [echo] Building solr-map-reduce... -validate-maven-dependencies.init: -validate-maven-dependencies: [artifact:dependencies] [INFO] snapshot org.apache.solr:solr-cell:5.0-SNAPSHOT: checking for updates from maven-restlet [artifact:dependencies] [INFO] snapshot org.apache.solr:solr-cell:5.0-SNAPSHOT: checking for updates from releases.cloudera.com [artifact:dependencies] [INFO] snapshot org.apache.solr:solr-morphlines-cell:5.0-SNAPSHOT: checking for updates from maven-restlet [artifact:dependencies] [INFO] snapshot org.apache.solr:solr-morphlines-cell:5.0-SNAPSHOT: checking for updates from releases.cloudera.com [artifact:dependencies] [INFO] snapshot org.apache.solr:solr-morphlines-core:5.0-SNAPSHOT: checking for updates from maven-restlet [artifact:dependencies] [INFO] snapshot org.apache.solr:solr-morphlines-core:5.0-SNAPSHOT: checking for updates from releases.cloudera.com [artifact:dependencies] An error has occurred while processing the Maven artifact tasks. [artifact:dependencies] Diagnosis: [artifact:dependencies] [artifact:dependencies] Unable to resolve artifact: Unable to get dependency information: Unable to read the metadata file for artifact 'com.sun.jersey:jersey-bundle:jar': Cannot find parent: com.sun.jersey:jersey-project for project: null:jersey-bundle:jar:null for project null:jersey-bundle:jar:null [artifact:dependencies] com.sun.jersey:jersey-bundle:jar:1.8 [artifact:dependencies] [artifact:dependencies] from the specified remote repositories: [artifact:dependencies] central (http://repo1.maven.org/maven2), [artifact:dependencies] releases.cloudera.com (https://repository.cloudera.com/artifactory/libs-release), [artifact:dependencies] maven-restlet (http://maven.restlet.org), [artifact:dependencies] Nexus (http://repository.apache.org/snapshots) [artifact:dependencies] [artifact:dependencies] Path to dependency: [artifact:dependencies] 1) org.apache.solr:solr-map-reduce:jar:5.0-SNAPSHOT [artifact:dependencies] [artifact:dependencies] [artifact:dependencies] Not a v4.0.0 POM. for project com.sun.jersey:jersey-project at /home/hudson/.m2/repository/com/sun/jersey/jersey-project/1.8/jersey-project-1.8.pom {noformat} I couldn't reproduce locally. Turns out the parent POM in question, at {{/home/hudson/.m2/repository/com/sun/jersey/jersey-project/1.8/jersey-project-1.8.pom}}, has the wrong contents: {noformat} <html> <head><title>301 Moved Permanently</title></head> <body bgcolor="white"> <center><h1>301 Moved Permanently</h1></center> <hr><center>nginx/0.6.39</center> </body> </html> {noformat} I replaced this by manually downloading the correct POM and it's checksum file from Maven Central and putting them in the hudson user's local Maven repository. [~markrmil...@gmail.com]: While investigating this failure, I tried dropping the triggering Ivy dependency com.sun.jersey:jersey-bundle, and all enabled tests succeed. Okay with you to drop this dependency? The description from the POM says: {code:xml} <description> A bundle containing code of all jar-based modules that provide JAX-RS and Jersey-related features. Such a bundle is *only intended* for developers that do not use Maven's dependency system. The bundle does not include code for contributes, tests and samples. </description> {code} Sounds like it's a sneaky replacement for transitive dependencies? IMHO, if we need some of the classes this jar provides, we should declare direct dependencies on the appropriate artifacts. > Add a Solr contrib that allows for building Solr indexes via Hadoop's > Map-Reduce. > --------------------------------------------------------------------------------- > > Key: SOLR-1301 > URL: https://issues.apache.org/jira/browse/SOLR-1301 > Project: Solr > Issue Type: New Feature > Reporter: Andrzej Bialecki > Assignee: Mark Miller > Fix For: 5.0, 4.7 > > Attachments: README.txt, SOLR-1301-hadoop-0-20.patch, > SOLR-1301-hadoop-0-20.patch, SOLR-1301-maven-intellij.patch, SOLR-1301.patch, > SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, > SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, > SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, > SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, > SOLR-1301.patch, SolrRecordWriter.java, commons-logging-1.0.4.jar, > commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, > hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, > log4j-1.2.15.jar > > > This patch contains a contrib module that provides distributed indexing > (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is > twofold: > * provide an API that is familiar to Hadoop developers, i.e. that of > OutputFormat > * avoid unnecessary export and (de)serialization of data maintained on HDFS. > SolrOutputFormat consumes data produced by reduce tasks directly, without > storing it in intermediate files. Furthermore, by using an > EmbeddedSolrServer, the indexing task is split into as many parts as there > are reducers, and the data to be indexed is not sent over the network. > Design > ---------- > Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, > which in turn uses SolrRecordWriter to write this data. SolrRecordWriter > instantiates an EmbeddedSolrServer, and it also instantiates an > implementation of SolrDocumentConverter, which is responsible for turning > Hadoop (key, value) into a SolrInputDocument. This data is then added to a > batch, which is periodically submitted to EmbeddedSolrServer. When reduce > task completes, and the OutputFormat is closed, SolrRecordWriter calls > commit() and optimize() on the EmbeddedSolrServer. > The API provides facilities to specify an arbitrary existing solr.home > directory, from which the conf/ and lib/ files will be taken. > This process results in the creation of as many partial Solr home directories > as there were reduce tasks. The output shards are placed in the output > directory on the default filesystem (e.g. HDFS). Such part-NNNNN directories > can be used to run N shard servers. Additionally, users can specify the > number of reduce tasks, in particular 1 reduce task, in which case the output > will consist of a single shard. > An example application is provided that processes large CSV files and uses > this API. It uses a custom CSV processing to avoid (de)serialization overhead. > This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this > issue, you should put it in contrib/hadoop/lib. > Note: the development of this patch was sponsored by an anonymous contributor > and approved for release under Apache License. -- This message was sent by Atlassian JIRA (v6.1#6144) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org