Hi Ralph,
Here is the data.

MorphlineSolrSink and MorphlineInterceptor appear 13-14% of the time in this 
sample set.

org.apache.flume.channel.kafka.KafkaChannel                64%  
hdfs                                                       38%  
org.apache.flume.source.kafka.KafkaSource                  33%  
memory                                                     29%  
file                                                       28%  
spooldir                                                   26%  
null                                                       25%  
org.apache.flume.sink.kafka.KafkaSink                      25%  
Custom JMSSource 18%  
jms                                                        17%  
static                                                     14%  
org.apache.flume.sink.solr.morphline.MorphlineInterceptor  14%  
ElasticSearchSink   13%  
org.apache.flume.sink.solr.morphline.MorphlineSolrSink     13%  
host                                                       13%  
timestamp                                                  13%  
avro                                                       11%  
hbase                                                      11%

Let me know if you’d like to drill down further.

Tristan  

From: Ralph Goers <ralph.go...@dslextreme.com>
Reply: dev@flume.apache.org <dev@flume.apache.org>
Date: 15 January 2022 at 06:48:43
To: dev@flume.apache.org <dev@flume.apache.org>
Subject:  Re: Morphlines-solr-sink  

I would like to see the data on the usage. I’m not sure how you would know 
since Cloudera doesn’t seem to include Flume in its products any more from what 
I can tell.  

The kite-morphines project consists of 18 sub-modules plug 4 aggregation 
modules. That is a heck of a lot of stuff to try to drag in. I would prefer to 
fork the parts of kite we would need to a new flume-kite repo.  

It seems that the CVE the reporter mentioned does have a fix. It is available 
in parquet-avro 1.11.2 and 1.12.2. I was able to swap the new version for the 
old one even though the groupId has changed. That said, the kite-sdk dependency 
that includes it is marked as optional, so parquet-avro would be optional as 
well. So I have no idea if it is even used.  

In any case, the unit tests all pass with the updated dependency.  

Ralph  



> On Jan 14, 2022, at 3:33 PM, Tristan Stevens <tris...@apache.org> wrote:  
>  
> -1 from me.  
>  
> First wee can’t do that in a patch release, but that’s semantics.  
>  
> Both the Morphlines interceptor and the Morphlines-Solr-Sink are components 
> that are widely used amongst the community. I did some analysis last year 
> that I’ll dig out and share, but they are two of the most used components 
> after HDFS sink, Kafka and JMS.  
>  
> Whilst I agree it’s sucky that Cloudera aren’t supporting Kite anymore, I 
> wonder whether we can find a way to bring Morphlines into here, or otherwise 
> get upstream and fix the bits that need fixing.  
>  
> Tristan  
>  
>  
> From: Ralph Goers <ralph.go...@dslextreme.com> 
> <mailto:ralph.go...@dslextreme.com>  
> Reply: dev@flume.apache.org <mailto:dev@flume.apache.org> 
> <dev@flume.apache.org> <mailto:dev@flume.apache.org>  
> Date: 13 January 2022 at 15:26:12  
> To: dev@flume.apache.org <dev@flume.apache.org> <mailto:dev@flume.apache.org> 
>  
> Subject: Morphlines-solr-sink  
>  
>> While I am not having any trouble building the morphline-solr-sink 
>> component, it is dependent on the abandoned kite-sdk, which makes its life 
>> very limited.  
>>  
>> In addition, the kite-sdk has a dependency on parquet-avro which, according 
>> to https://issues.apache.org/jira/browse/FLUME-3403, has vulnerabilities in 
>> every available release.  
>>  
>> Due to these factors I am going to remove the morphline-solr-sink module 
>> from Flume for the 1.10.0 release.  
>>  
>> Ralph  

Reply via email to