[ 
https://issues.apache.org/jira/browse/FLUME-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13710118#comment-13710118
 ] 

wolfgang hoschek commented on FLUME-1687:
-----------------------------------------

My understanding of FLUME-1687 is that it simply forwards the flume headers 
as-is to Solr, i.e. it essentially expects an upstream component to send flume 
events that conform and are formatted exactly as required by Solr. I think it 
also doesn't support SolrCloud.

In contrast, Morphline Solr Sink is well suited for use cases that stream raw 
data into HDFS (via the HdfsSink) and simultaneously extract, transform and 
load the same data into Solr. In particular, the Morphline Solr Sink can 
process arbitrary heterogeneous raw data from disparate data sources and turn 
it into a data model that is useful to Search applications. The ETL 
functionality is customizable using a morphline configuration file that defines 
a chain of pluggable transformation commands that pipe event records from one 
command to another. The Morphline Solr Sink also supports SolrCloud and 
transactional batching and Solr for more scalability, and Solr collection 
aliases (e.g. for transparent expiry of old index partitions).

Morphline Solr Sink can do everything that FLUME-1687 can do, and more.

Would be nice to merge those two efforts into one.


                
> ApacheSolrSink
> --------------
>
>                 Key: FLUME-1687
>                 URL: https://issues.apache.org/jira/browse/FLUME-1687
>             Project: Flume
>          Issue Type: New Feature
>          Components: Sinks+Sources
>    Affects Versions: v1.2.0, v1.4.0
>            Reporter: wolfgang hoschek
>            Assignee: Israel Ekpo
>         Attachments: flume-new-feature-dependencies.zip, 
> flume-new-features-1.3.1.jar, flume-new-features-1.3.1-sources.jar
>
>
> Some use cases need near real time full text indexing of data through Flume 
> into Solr, where a Flume sink can write directly to a Solr search server. 
> This is a scalable way to provide low latency querying and data acquisition. 
> It complements (rather than replaces) use cases based on Map Reduce batch 
> analysis of HDFS data.
> Apache Solr has a client API that uses REST to add documents to a Solr 
> server, which in turn is based on Lucene. A Solr Sink can extract documents 
> from flume events and forward them to Solr.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to