[ https://issues.apache.org/jira/browse/FLUME-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13710118#comment-13710118 ]
wolfgang hoschek commented on FLUME-1687: ----------------------------------------- My understanding of FLUME-1687 is that it simply forwards the flume headers as-is to Solr, i.e. it essentially expects an upstream component to send flume events that conform and are formatted exactly as required by Solr. I think it also doesn't support SolrCloud. In contrast, Morphline Solr Sink is well suited for use cases that stream raw data into HDFS (via the HdfsSink) and simultaneously extract, transform and load the same data into Solr. In particular, the Morphline Solr Sink can process arbitrary heterogeneous raw data from disparate data sources and turn it into a data model that is useful to Search applications. The ETL functionality is customizable using a morphline configuration file that defines a chain of pluggable transformation commands that pipe event records from one command to another. The Morphline Solr Sink also supports SolrCloud and transactional batching and Solr for more scalability, and Solr collection aliases (e.g. for transparent expiry of old index partitions). Morphline Solr Sink can do everything that FLUME-1687 can do, and more. Would be nice to merge those two efforts into one. > ApacheSolrSink > -------------- > > Key: FLUME-1687 > URL: https://issues.apache.org/jira/browse/FLUME-1687 > Project: Flume > Issue Type: New Feature > Components: Sinks+Sources > Affects Versions: v1.2.0, v1.4.0 > Reporter: wolfgang hoschek > Assignee: Israel Ekpo > Attachments: flume-new-feature-dependencies.zip, > flume-new-features-1.3.1.jar, flume-new-features-1.3.1-sources.jar > > > Some use cases need near real time full text indexing of data through Flume > into Solr, where a Flume sink can write directly to a Solr search server. > This is a scalable way to provide low latency querying and data acquisition. > It complements (rather than replaces) use cases based on Map Reduce batch > analysis of HDFS data. > Apache Solr has a client API that uses REST to add documents to a Solr > server, which in turn is based on Lucene. A Solr Sink can extract documents > from flume events and forward them to Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira