[jira] [Created] (FLUME-3000) Update morphline solr sink to use solr-6

2016-09-28 Thread wolfgang hoschek (JIRA)
wolfgang hoschek created FLUME-3000:
---

 Summary: Update morphline solr sink to use solr-6
 Key: FLUME-3000
 URL: https://issues.apache.org/jira/browse/FLUME-3000
 Project: Flume
  Issue Type: Bug
  Components: Sinks+Sources
Affects Versions: v1.5.2
Reporter: wolfgang hoschek
Assignee: wolfgang hoschek
Priority: Minor


Move flume from solr-4 to solr-6. This involves changing flume to depend on the 
upcoming kite-1.2 release, which in turn used the solr-6 API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2633) Update Kite dependency to 1.0.0

2015-02-26 Thread wolfgang hoschek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339408#comment-14339408
 ] 

wolfgang hoschek commented on FLUME-2633:
-

Can you please also update the solr and tika versions to match what is used by 
kite-1.0. Otherwise there can be version conflicts and runtime errors in the 
flume morphline solr sink. Kite-1.0 requires solr-4.10.3 and tika-1.5.

> Update Kite dependency to 1.0.0
> ---
>
> Key: FLUME-2633
> URL: https://issues.apache.org/jira/browse/FLUME-2633
> Project: Flume
>  Issue Type: Bug
>  Components: Sinks+Sources
>Reporter: Tom White
>Assignee: Tom White
> Fix For: v1.6.0
>
> Attachments: FLUME-2633.patch
>
>
> Update the dataset sink to use Kite 1.0.0 which has a stable API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2571) Update Kite dependency to 0.17.0 (or 0.17.1)

2014-12-09 Thread wolfgang hoschek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14239467#comment-14239467
 ] 

wolfgang hoschek commented on FLUME-2571:
-

Good idea!

> Update Kite dependency to 0.17.0 (or 0.17.1)
> 
>
> Key: FLUME-2571
> URL: https://issues.apache.org/jira/browse/FLUME-2571
> Project: Flume
>  Issue Type: Improvement
>Reporter: Santiago M. Mola
> Fix For: v1.6.0
>
>
> Flume works without any change with Kite 0.17.0.
> Kite 0.17.1 should be out really soon (this week?), so maybe we can wait and 
> upgrade directly to 0.17.1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [ANNOUNCE] New Flume PMC Member - Roshan Naik

2014-11-05 Thread Wolfgang Hoschek
Congrats Roshan!

On Nov 5, 2014, at 11:54 AM, Saravanan Nagarajan 
 wrote:

> Congratulations Roshan!
> 
> On Wed, Nov 5, 2014 at 7:32 AM, Ahmed Radwan  wrote:
> 
>> Congrats Roshan!
>> 
>> On Tue, Nov 4, 2014 at 2:12 PM, Arvind Prabhakar 
>> wrote:
>> 
>>> On behalf of Apache Flume PMC, it is my pleasure to announce that Roshan
>>> Naik has been elected to the Flume Project Management Committee. Roshan
>> has
>>> been active with the project for many years and has been a committer on
>> the
>>> project since September of 2013.
>>> 
>>> Please join me in congratulating Roshan and welcoming him to the Flume
>> PMC.
>>> 
>>> Regards,
>>> Arvind Prabhakar
>>> 
>> 



[jira] [Commented] (FLUME-2530) Resource leaks found by Coverity tool

2014-10-29 Thread wolfgang hoschek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189067#comment-14189067
 ] 

wolfgang hoschek commented on FLUME-2530:
-

+1 to the change on MorphlineHandlerImpl if it makes coverity happy (even 
though the change is just cosmetic and has no observable effect).

> Resource leaks found by Coverity tool
> -
>
> Key: FLUME-2530
> URL: https://issues.apache.org/jira/browse/FLUME-2530
> Project: Flume
>  Issue Type: Bug
>Reporter: Roshan Naik
>Assignee: Roshan Naik
> Attachments: coverity.patch
>
>
> A recent run of coverity on the Flume code base found some issues in various 
> components.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2501) Updating HttpClient lib version to ensure compat with Solr

2014-10-27 Thread wolfgang hoschek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14185186#comment-14185186
 ] 

wolfgang hoschek commented on FLUME-2501:
-

+1 Looks good. Thanks!

> Updating HttpClient lib version to ensure compat with Solr
> --
>
> Key: FLUME-2501
> URL: https://issues.apache.org/jira/browse/FLUME-2501
> Project: Flume
>  Issue Type: Bug
>  Components: Sinks+Sources
>Affects Versions: v1.5.0.1
>Reporter: Roshan Naik
>Assignee: Roshan Naik
> Attachments: FLUME-2501.patch, FLUME-2501.v2.patch
>
>
> Mismatch in httpclient and http core libs pulled by flume v/s the ones that 
> come with Solr causes errors at runtime
> {code}
> 2014-10-13 19:52:32,042 (lifecycleSupervisor-1-1) [DEBUG - 
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:106)]
>  Creating new http client, 
> config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false
> 2014-10-13 19:52:32,225 (lifecycleSupervisor-1-1) [ERROR - 
> org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:253)]
>  Unable to start SinkRunner: { 
> policy:org.apache.flume.sink.DefaultSinkProcessor@4752b854 counterGroup:{ 
> name:null counters:{} } } - Exception follows.
> java.lang.NoSuchFieldError: DEF_CONTENT_CHARSET
>   at 
> org.apache.http.impl.client.DefaultHttpClient.setDefaultHttpParams(DefaultHttpClient.java:175)
>   at 
> org.apache.http.impl.client.DefaultHttpClient.createHttpParams(DefaultHttpClient.java:158)
>   at 
> org.apache.http.impl.client.AbstractHttpClient.getParams(AbstractHttpClient.java:448)
>   at 
> org.apache.solr.client.solrj.impl.HttpClientUtil.setFollowRedirects(HttpClientUtil.java:251)
>   at 
> org.apache.solr.client.solrj.impl.HttpClientConfigurer.configure(HttpClientConfigurer.java:58)
>   at 
> org.apache.solr.client.solrj.impl.HttpClientUtil.configureClient(HttpClientUtil.java:133)
>   at 
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:109)
>   at 
> org.apache.solr.client.solrj.impl.HttpSolrServer.(HttpSolrServer.java:161)
>   at 
> org.apache.solr.client.solrj.impl.HttpSolrServer.(HttpSolrServer.java:138)
>   at 
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.(ConcurrentUpdateSolrServer.java:122)
>   at 
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.(ConcurrentUpdateSolrServer.java:114)
>   at 
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.(ConcurrentUpdateSolrServer.java:104)
>   at 
> org.kitesdk.morphline.solr.SafeConcurrentUpdateSolrServer.(SafeConcurrentUpdateSolrServer.java:39)
>   at 
> org.kitesdk.morphline.solr.SafeConcurrentUpdateSolrServer.(SafeConcurrentUpdateSolrServer.java:35)
>   at 
> org.kitesdk.morphline.solr.SolrLocator.getLoader(SolrLocator.java:116)
>   at 
> org.kitesdk.morphline.solr.LoadSolrBuilder$LoadSolr.(LoadSolrBuilder.java:70)
>   at 
> org.kitesdk.morphline.solr.LoadSolrBuilder.build(LoadSolrBuilder.java:52)
>   at 
> org.kitesdk.morphline.base.AbstractCommand.buildCommand(AbstractCommand.java:303)
>   at 
> org.kitesdk.morphline.base.AbstractCommand.buildCommandChain(AbstractCommand.java:250)
>   at org.kitesdk.morphline.stdlib.Pipe.(Pipe.java:46)
>   at org.kitesdk.morphline.stdlib.PipeBuilder.build(PipeBuilder.java:40)
>   at org.kitesdk.morphline.base.Compiler.compile(Compiler.java:126)
>   at org.kitesdk.morphline.base.Compiler.compile(Compiler.java:55)
>   at 
> org.apache.flume.sink.solr.morphline.MorphlineHandlerImpl.configure(MorphlineHandlerImpl.java:101)
>   at 
> org.apache.flume.sink.solr.morphline.MorphlineSink.start(MorphlineSink.java:97)
>   at 
> org.apache.flume.sink.DefaultSinkProcessor.start(DefaultSinkProcessor.java:46)
>   at org.apache.flume.SinkRunner.start(SinkRunner.java:79)
>   at 
> org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2501) Updating HttpClient lib version to ensure compat with Solr

2014-10-16 Thread wolfgang hoschek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14174143#comment-14174143
 ] 

wolfgang hoschek commented on FLUME-2501:
-

Similar reason here - Flume in CDH 5 uses 4.2.5.

> Updating HttpClient lib version to ensure compat with Solr
> --
>
> Key: FLUME-2501
> URL: https://issues.apache.org/jira/browse/FLUME-2501
> Project: Flume
>  Issue Type: Bug
>  Components: Sinks+Sources
>Affects Versions: v1.5.0.1
>Reporter: Roshan Naik
>Assignee: Roshan Naik
> Attachments: FLUME-2501.patch
>
>
> Mismatch in httpclient and http core libs pulled by flume v/s the ones that 
> come with Solr causes errors at runtime
> {code}
> 2014-10-13 19:52:32,042 (lifecycleSupervisor-1-1) [DEBUG - 
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:106)]
>  Creating new http client, 
> config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false
> 2014-10-13 19:52:32,225 (lifecycleSupervisor-1-1) [ERROR - 
> org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:253)]
>  Unable to start SinkRunner: { 
> policy:org.apache.flume.sink.DefaultSinkProcessor@4752b854 counterGroup:{ 
> name:null counters:{} } } - Exception follows.
> java.lang.NoSuchFieldError: DEF_CONTENT_CHARSET
>   at 
> org.apache.http.impl.client.DefaultHttpClient.setDefaultHttpParams(DefaultHttpClient.java:175)
>   at 
> org.apache.http.impl.client.DefaultHttpClient.createHttpParams(DefaultHttpClient.java:158)
>   at 
> org.apache.http.impl.client.AbstractHttpClient.getParams(AbstractHttpClient.java:448)
>   at 
> org.apache.solr.client.solrj.impl.HttpClientUtil.setFollowRedirects(HttpClientUtil.java:251)
>   at 
> org.apache.solr.client.solrj.impl.HttpClientConfigurer.configure(HttpClientConfigurer.java:58)
>   at 
> org.apache.solr.client.solrj.impl.HttpClientUtil.configureClient(HttpClientUtil.java:133)
>   at 
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:109)
>   at 
> org.apache.solr.client.solrj.impl.HttpSolrServer.(HttpSolrServer.java:161)
>   at 
> org.apache.solr.client.solrj.impl.HttpSolrServer.(HttpSolrServer.java:138)
>   at 
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.(ConcurrentUpdateSolrServer.java:122)
>   at 
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.(ConcurrentUpdateSolrServer.java:114)
>   at 
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.(ConcurrentUpdateSolrServer.java:104)
>   at 
> org.kitesdk.morphline.solr.SafeConcurrentUpdateSolrServer.(SafeConcurrentUpdateSolrServer.java:39)
>   at 
> org.kitesdk.morphline.solr.SafeConcurrentUpdateSolrServer.(SafeConcurrentUpdateSolrServer.java:35)
>   at 
> org.kitesdk.morphline.solr.SolrLocator.getLoader(SolrLocator.java:116)
>   at 
> org.kitesdk.morphline.solr.LoadSolrBuilder$LoadSolr.(LoadSolrBuilder.java:70)
>   at 
> org.kitesdk.morphline.solr.LoadSolrBuilder.build(LoadSolrBuilder.java:52)
>   at 
> org.kitesdk.morphline.base.AbstractCommand.buildCommand(AbstractCommand.java:303)
>   at 
> org.kitesdk.morphline.base.AbstractCommand.buildCommandChain(AbstractCommand.java:250)
>   at org.kitesdk.morphline.stdlib.Pipe.(Pipe.java:46)
>   at org.kitesdk.morphline.stdlib.PipeBuilder.build(PipeBuilder.java:40)
>   at org.kitesdk.morphline.base.Compiler.compile(Compiler.java:126)
>   at org.kitesdk.morphline.base.Compiler.compile(Compiler.java:55)
>   at 
> org.apache.flume.sink.solr.morphline.MorphlineHandlerImpl.configure(MorphlineHandlerImpl.java:101)
>   at 
> org.apache.flume.sink.solr.morphline.MorphlineSink.start(MorphlineSink.java:97)
>   at 
> org.apache.flume.sink.DefaultSinkProcessor.start(DefaultSinkProcessor.java:46)
>   at org.apache.flume.SinkRunner.start(SinkRunner.java:79)
>   at 
> org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (FLUME-2501) Updating HttpClient lib version to ensure compat with Solr

2014-10-16 Thread wolfgang hoschek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14173554#comment-14173554
 ] 

wolfgang hoschek edited comment on FLUME-2501 at 10/16/14 9:13 AM:
---

How about updating both httpclient and httpcore to 4.2.5? Reason is because I 
know that this works in production, whereas I don't know about 4.2.3.


was (Author: whoschek):
How about updating to httpclient to 4.2.5 instead of 4.2.3? Reason is because I 
know that the former works in production, whereas I don't know about the latter.

> Updating HttpClient lib version to ensure compat with Solr
> --
>
> Key: FLUME-2501
> URL: https://issues.apache.org/jira/browse/FLUME-2501
> Project: Flume
>  Issue Type: Bug
>  Components: Sinks+Sources
>Affects Versions: v1.5.0.1
>Reporter: Roshan Naik
>Assignee: Roshan Naik
> Attachments: FLUME-2501.patch
>
>
> Mismatch in httpclient and http core libs pulled by flume v/s the ones that 
> come with Solr causes errors at runtime
> {code}
> 2014-10-13 19:52:32,042 (lifecycleSupervisor-1-1) [DEBUG - 
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:106)]
>  Creating new http client, 
> config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false
> 2014-10-13 19:52:32,225 (lifecycleSupervisor-1-1) [ERROR - 
> org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:253)]
>  Unable to start SinkRunner: { 
> policy:org.apache.flume.sink.DefaultSinkProcessor@4752b854 counterGroup:{ 
> name:null counters:{} } } - Exception follows.
> java.lang.NoSuchFieldError: DEF_CONTENT_CHARSET
>   at 
> org.apache.http.impl.client.DefaultHttpClient.setDefaultHttpParams(DefaultHttpClient.java:175)
>   at 
> org.apache.http.impl.client.DefaultHttpClient.createHttpParams(DefaultHttpClient.java:158)
>   at 
> org.apache.http.impl.client.AbstractHttpClient.getParams(AbstractHttpClient.java:448)
>   at 
> org.apache.solr.client.solrj.impl.HttpClientUtil.setFollowRedirects(HttpClientUtil.java:251)
>   at 
> org.apache.solr.client.solrj.impl.HttpClientConfigurer.configure(HttpClientConfigurer.java:58)
>   at 
> org.apache.solr.client.solrj.impl.HttpClientUtil.configureClient(HttpClientUtil.java:133)
>   at 
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:109)
>   at 
> org.apache.solr.client.solrj.impl.HttpSolrServer.(HttpSolrServer.java:161)
>   at 
> org.apache.solr.client.solrj.impl.HttpSolrServer.(HttpSolrServer.java:138)
>   at 
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.(ConcurrentUpdateSolrServer.java:122)
>   at 
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.(ConcurrentUpdateSolrServer.java:114)
>   at 
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.(ConcurrentUpdateSolrServer.java:104)
>   at 
> org.kitesdk.morphline.solr.SafeConcurrentUpdateSolrServer.(SafeConcurrentUpdateSolrServer.java:39)
>   at 
> org.kitesdk.morphline.solr.SafeConcurrentUpdateSolrServer.(SafeConcurrentUpdateSolrServer.java:35)
>   at 
> org.kitesdk.morphline.solr.SolrLocator.getLoader(SolrLocator.java:116)
>   at 
> org.kitesdk.morphline.solr.LoadSolrBuilder$LoadSolr.(LoadSolrBuilder.java:70)
>   at 
> org.kitesdk.morphline.solr.LoadSolrBuilder.build(LoadSolrBuilder.java:52)
>   at 
> org.kitesdk.morphline.base.AbstractCommand.buildCommand(AbstractCommand.java:303)
>   at 
> org.kitesdk.morphline.base.AbstractCommand.buildCommandChain(AbstractCommand.java:250)
>   at org.kitesdk.morphline.stdlib.Pipe.(Pipe.java:46)
>   at org.kitesdk.morphline.stdlib.PipeBuilder.build(PipeBuilder.java:40)
>   at org.kitesdk.morphline.base.Compiler.compile(Compiler.java:126)
>   at org.kitesdk.morphline.base.Compiler.compile(Compiler.java:55)
>   at 
> org.apache.flume.sink.solr.morphline.MorphlineHandlerImpl.configure(MorphlineHandlerImpl.java:101)
>   at 
> org.apache.flume.sink.solr.morphline.MorphlineSink.start(MorphlineSink.java:97)
>   at 
> org.apache.flume.sink.DefaultSinkProcessor.start(DefaultSinkProcessor.java:46)
>   at org.apache.flume.SinkRunner.start(SinkRunner.java:79)
>   at 
> org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.

[jira] [Commented] (FLUME-2501) Updating HttpClient lib version to ensure compat with Solr

2014-10-16 Thread wolfgang hoschek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14173554#comment-14173554
 ] 

wolfgang hoschek commented on FLUME-2501:
-

How about updating to httpclient to 4.2.5 instead of 4.2.3? Reason is because I 
know that the former works in production, whereas I don't know about the latter.

> Updating HttpClient lib version to ensure compat with Solr
> --
>
> Key: FLUME-2501
> URL: https://issues.apache.org/jira/browse/FLUME-2501
> Project: Flume
>  Issue Type: Bug
>  Components: Sinks+Sources
>Affects Versions: v1.5.0.1
>Reporter: Roshan Naik
>Assignee: Roshan Naik
> Attachments: FLUME-2501.patch
>
>
> Mismatch in httpclient and http core libs pulled by flume v/s the ones that 
> come with Solr causes errors at runtime
> {code}
> 2014-10-13 19:52:32,042 (lifecycleSupervisor-1-1) [DEBUG - 
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:106)]
>  Creating new http client, 
> config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false
> 2014-10-13 19:52:32,225 (lifecycleSupervisor-1-1) [ERROR - 
> org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:253)]
>  Unable to start SinkRunner: { 
> policy:org.apache.flume.sink.DefaultSinkProcessor@4752b854 counterGroup:{ 
> name:null counters:{} } } - Exception follows.
> java.lang.NoSuchFieldError: DEF_CONTENT_CHARSET
>   at 
> org.apache.http.impl.client.DefaultHttpClient.setDefaultHttpParams(DefaultHttpClient.java:175)
>   at 
> org.apache.http.impl.client.DefaultHttpClient.createHttpParams(DefaultHttpClient.java:158)
>   at 
> org.apache.http.impl.client.AbstractHttpClient.getParams(AbstractHttpClient.java:448)
>   at 
> org.apache.solr.client.solrj.impl.HttpClientUtil.setFollowRedirects(HttpClientUtil.java:251)
>   at 
> org.apache.solr.client.solrj.impl.HttpClientConfigurer.configure(HttpClientConfigurer.java:58)
>   at 
> org.apache.solr.client.solrj.impl.HttpClientUtil.configureClient(HttpClientUtil.java:133)
>   at 
> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:109)
>   at 
> org.apache.solr.client.solrj.impl.HttpSolrServer.(HttpSolrServer.java:161)
>   at 
> org.apache.solr.client.solrj.impl.HttpSolrServer.(HttpSolrServer.java:138)
>   at 
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.(ConcurrentUpdateSolrServer.java:122)
>   at 
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.(ConcurrentUpdateSolrServer.java:114)
>   at 
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.(ConcurrentUpdateSolrServer.java:104)
>   at 
> org.kitesdk.morphline.solr.SafeConcurrentUpdateSolrServer.(SafeConcurrentUpdateSolrServer.java:39)
>   at 
> org.kitesdk.morphline.solr.SafeConcurrentUpdateSolrServer.(SafeConcurrentUpdateSolrServer.java:35)
>   at 
> org.kitesdk.morphline.solr.SolrLocator.getLoader(SolrLocator.java:116)
>   at 
> org.kitesdk.morphline.solr.LoadSolrBuilder$LoadSolr.(LoadSolrBuilder.java:70)
>   at 
> org.kitesdk.morphline.solr.LoadSolrBuilder.build(LoadSolrBuilder.java:52)
>   at 
> org.kitesdk.morphline.base.AbstractCommand.buildCommand(AbstractCommand.java:303)
>   at 
> org.kitesdk.morphline.base.AbstractCommand.buildCommandChain(AbstractCommand.java:250)
>   at org.kitesdk.morphline.stdlib.Pipe.(Pipe.java:46)
>   at org.kitesdk.morphline.stdlib.PipeBuilder.build(PipeBuilder.java:40)
>   at org.kitesdk.morphline.base.Compiler.compile(Compiler.java:126)
>   at org.kitesdk.morphline.base.Compiler.compile(Compiler.java:55)
>   at 
> org.apache.flume.sink.solr.morphline.MorphlineHandlerImpl.configure(MorphlineHandlerImpl.java:101)
>   at 
> org.apache.flume.sink.solr.morphline.MorphlineSink.start(MorphlineSink.java:97)
>   at 
> org.apache.flume.sink.DefaultSinkProcessor.start(DefaultSinkProcessor.java:46)
>   at org.apache.flume.SinkRunner.start(SinkRunner.java:79)
>   at 
> org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$Scheduled

[jira] [Commented] (FLUME-2503) hadoop-1 profile is broken in Morphline Sink

2014-10-16 Thread wolfgang hoschek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14173505#comment-14173505
 ] 

wolfgang hoschek commented on FLUME-2503:
-

+1 Patch looks good to me and unit tests continue to pass both under hadoop-1 
and hadoop-2 profile.

> hadoop-1 profile is broken in Morphline Sink
> 
>
> Key: FLUME-2503
> URL: https://issues.apache.org/jira/browse/FLUME-2503
> Project: Flume
>  Issue Type: Bug
>Reporter: Hari Shreedharan
>Assignee: Hari Shreedharan
> Attachments: FLUME-2503.patch
>
>
> kite-morphlines-solr-core test-jar must also exclude hadoop-common:
> [INFO] Downloading: 
> https://repository.apache.org/releases/org/eclipse/jetty/jetty-xml/8.1.8.v20121106/jetty-xml-8.1.8.v20121106.pom
> [INFO] Downloading: 
> http://repo.maven.apache.org/maven2/org/apache/hadoop/hadoop-common/1.0.1/hadoop-common-1.0.1.pom
> [INFO] Downloading: 
> https://oss.sonatype.org/content/repositories/releases/org/apache/hadoop/hadoop-common/1.0.1/hadoop-common-1.0.1.pom
> [INFO] Downloading: 
> https://repository.apache.org/releases/org/apache/hadoop/hadoop-common/1.0.1/hadoop-common-1.0.1.pom
> [INFO] Downloading: 
> https://repository.cloudera.com/artifactory/cloudera-repos/org/apache/hadoop/hadoop-common/1.0.1/hadoop-common-1.0.1.pom
> [INFO] Downloading: 
> http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-common/1.0.1/hadoop-common-1.0.1.pom
> [INFO] Downloading: 
> http://repository.jboss.org/nexus/content/groups/public/org/apache/hadoop/hadoop-common/1.0.1/hadoop-common-1.0.1.pom
> [INFO] Downloading: 
> http://maven.twttr.com/org/apache/hadoop/hadoop-common/1.0.1/hadoop-common-1.0.1.pom
> [INFO] Downloading: 
> http://maven.restlet.org/org/apache/hadoop/hadoop-common/1.0.1/hadoop-common-1.0.1.pom



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2491) Include Kite morphline dependencies for Morphline Solr sink

2014-10-02 Thread wolfgang hoschek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157437#comment-14157437
 ] 

wolfgang hoschek commented on FLUME-2491:
-

The advantage of marking the dependencies as "optional" is to avoid adding a 
lot of unnecessary libs to the classpath for folks that don't need morphlines. 
The disadvantage is the additional install complexity that Roshan is suffering 
from.  I'm ok with removing the "optional" marker if that's what people want.

> Include Kite morphline dependencies for Morphline Solr sink
> ---
>
> Key: FLUME-2491
> URL: https://issues.apache.org/jira/browse/FLUME-2491
> Project: Flume
>  Issue Type: Bug
>  Components: Sinks+Sources
>Affects Versions: v1.5.0.1
>Reporter: Roshan Naik
>Assignee: Roshan Naik
> Attachments: FLUME-2491.patch
>
>
> Currently, the kite morphline library version required by flume does not 
> appear to be available for download or direct customer install by end user. 
> So pulling them into the flume binary distribution through maven.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2491) Include Kite morphline dependencies for Morphline Solr sink

2014-10-02 Thread wolfgang hoschek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157425#comment-14157425
 ] 

wolfgang hoschek commented on FLUME-2491:
-

The solr and lucene libs are required for the morphline to talk to solr via 
these commands: 
http://kitesdk.org/docs/current/kite-morphlines/morphlinesReferenceGuide.html#/kite-morphlines-solr-core

hadoop-annotations-2.4.0.jar, hadoop-auth-2.4.0.jar, hadoop-hdfs-2.4.0.jar are 
pulled in by the solr client which uses them to talk secure kerberos to solr.

In this light, would be better to not exclude these libs.


> Include Kite morphline dependencies for Morphline Solr sink
> ---
>
> Key: FLUME-2491
> URL: https://issues.apache.org/jira/browse/FLUME-2491
> Project: Flume
>  Issue Type: Bug
>  Components: Sinks+Sources
>Affects Versions: v1.5.0.1
>Reporter: Roshan Naik
>Assignee: Roshan Naik
> Attachments: FLUME-2491.patch
>
>
> Currently, the kite morphline library version required by flume does not 
> appear to be available for download or direct customer install by end user. 
> So pulling them into the flume binary distribution through maven.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Kite Morphline Sink dependencies

2014-10-02 Thread Wolfgang Hoschek
Here is a way to get hold of the dependencies:

https://groups.google.com/a/cloudera.org/forum/#!msg/cdk-dev/7T4pTebdWN4/sBHGkoS70LkJ

Maybe we should add this info to the pom.xml to make it easier for folks? 
Thoughts?

Wolfgang.

On Oct 2, 2014, at 2:40 AM, Hari Shreedharan  wrote:

> Ryan/Wolfgang - Any pointers?
> 
> 
> Thanks,
> Hari
> 
> On Wed, Oct 1, 2014 at 5:08 PM, Roshan Naik 
> wrote:
> 
>> In Flume 1.5 the kite morphline 0.12 lib dependencies are marked as
>> optional in the morphline-solr sink's pom. Which is a change from flume 1.4
>> in which they were being pulled in.
>> For a end user of flume who wants to use this sink.. they would either need
>> - a downloadable tgz for that 0.12verson of the kite morphline lib.
>> - or, a yum installable rpm/repo somwhere
>> But not able to locate either. Anybody shed some light on how to get it ?
>> -roshan
>> -- 
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity to 
>> which it is addressed and may contain information that is confidential, 
>> privileged and exempt from disclosure under applicable law. If the reader 
>> of this message is not the intended recipient, you are hereby notified that 
>> any printing, copying, dissemination, distribution, disclosure or 
>> forwarding of this communication is strictly prohibited. If you have 
>> received this communication in error, please contact the sender immediately 
>> and delete it from your system. Thank You.



[jira] [Commented] (FLUME-2449) Kite-0.15 brings in hadoop-2.x dependencies in hadoop-1 profile

2014-08-25 Thread wolfgang hoschek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14109493#comment-14109493
 ] 

wolfgang hoschek commented on FLUME-2449:
-

org.apache.solr:solr-core indeeds pulls in hadoop-2 dependencies, via 
kite-morphlines-all. However, I haven't heard of any morphline solr sink user 
trying to use hadoop-1. Also considering that  kite-morphlines-all is marked as 
true in the morphline solr sink pom, I feel like there's 
no real need to fix this. Thoughts, anyone?

> Kite-0.15 brings in hadoop-2.x dependencies in hadoop-1 profile
> ---
>
> Key: FLUME-2449
> URL: https://issues.apache.org/jira/browse/FLUME-2449
> Project: Flume
>  Issue Type: Bug
>Reporter: Hari Shreedharan
>
> Mvn dependency tree shows this:
> {code}
> [INFO] 
> org.apache.flume.flume-ng-sinks:flume-ng-morphline-solr-sink:jar:1.6.0-SNAPSHOT
> [INFO] +- org.apache.flume:flume-ng-core:jar:1.6.0-SNAPSHOT:compile
> [INFO] |  +- org.apache.flume:flume-ng-sdk:jar:1.6.0-SNAPSHOT:compile
> [INFO] |  +- 
> org.apache.flume:flume-ng-configuration:jar:1.6.0-SNAPSHOT:compile
> [INFO] |  +- com.google.guava:guava:jar:11.0.2:compile
> [INFO] |  |  \- com.google.code.findbugs:jsr305:jar:1.3.9:compile
> [INFO] |  +- commons-io:commons-io:jar:2.1:compile
> [INFO] |  +- commons-codec:commons-codec:jar:1.8:compile
> [INFO] |  +- log4j:log4j:jar:1.2.17:compile
> [INFO] |  +- org.slf4j:slf4j-log4j12:jar:1.6.1:compile
> [INFO] |  +- commons-cli:commons-cli:jar:1.2:compile
> [INFO] |  +- commons-lang:commons-lang:jar:2.5:compile
> [INFO] |  +- org.apache.avro:avro:jar:1.7.4:compile
> [INFO] |  |  +- org.codehaus.jackson:jackson-core-asl:jar:1.9.3:compile 
> (version managed from 1.8.8)
> [INFO] |  |  +- org.codehaus.jackson:jackson-mapper-asl:jar:1.9.3:compile 
> (version managed from 1.8.8)
> [INFO] |  |  +- com.thoughtworks.paranamer:paranamer:jar:2.3:compile
> [INFO] |  |  +- org.xerial.snappy:snappy-java:jar:1.1.0:compile (version 
> managed from 1.0.4.1)
> [INFO] |  |  \- org.apache.commons:commons-compress:jar:1.4.1:compile
> [INFO] |  | \- org.tukaani:xz:jar:1.0:compile
> [INFO] |  +- org.apache.avro:avro-ipc:jar:1.7.4:compile
> [INFO] |  |  \- org.apache.velocity:velocity:jar:1.7:compile
> [INFO] |  | \- commons-collections:commons-collections:jar:3.2.1:compile
> [INFO] |  +- io.netty:netty:jar:3.5.12.Final:compile
> [INFO] |  +- joda-time:joda-time:jar:2.1:compile
> [INFO] |  +- org.mortbay.jetty:servlet-api:jar:2.5-20110124:compile
> [INFO] |  +- org.mortbay.jetty:jetty-util:jar:6.1.26:compile
> [INFO] |  +- org.mortbay.jetty:jetty:jar:6.1.26:compile
> [INFO] |  +- com.google.code.gson:gson:jar:2.2.2:compile
> [INFO] |  +- org.apache.thrift:libthrift:jar:0.7.0:compile
> [INFO] |  |  \- org.apache.httpcomponents:httpclient:jar:4.2.1:compile 
> (version managed from 4.0.1)
> [INFO] |  | +- org.apache.httpcomponents:httpcore:jar:4.2.1:compile
> [INFO] |  | \- commons-logging:commons-logging:jar:1.1.1:compile
> [INFO] |  \- org.apache.mina:mina-core:jar:2.0.4:compile
> [INFO] +- org.slf4j:slf4j-api:jar:1.6.1:compile
> [INFO] +- org.kitesdk:kite-morphlines-all:pom:0.15.0:compile
> [INFO] |  +- org.kitesdk:kite-morphlines-core:jar:0.15.0:compile
> [INFO] |  |  +- com.typesafe:config:jar:1.0.2:compile
> [INFO] |  |  +- com.codahale.metrics:metrics-core:jar:3.0.2:compile
> [INFO] |  |  \- com.codahale.metrics:metrics-healthchecks:jar:3.0.2:compile
> [INFO] |  +- org.kitesdk:kite-morphlines-avro:jar:0.15.0:compile
> [INFO] |  +- org.kitesdk:kite-morphlines-json:jar:0.15.0:compile
> [INFO] |  |  \- com.fasterxml.jackson.core:jackson-databind:jar:2.3.1:compile
> [INFO] |  | +- 
> com.fasterxml.jackson.core:jackson-annotations:jar:2.3.0:compile
> [INFO] |  | \- com.fasterxml.jackson.core:jackson-core:jar:2.3.1:compile
> [INFO] |  +- org.kitesdk:kite-morphlines-saxon:jar:0.15.0:compile
> [INFO] |  |  +- net.sf.saxon:Saxon-HE:jar:9.5.1-5:compile
> [INFO] |  |  \- org.ccil.cowan.tagsoup:tagsoup:jar:1.2.1:compile
> [INFO] |  +- org.kitesdk:kite-morphlines-hadoop-core:jar:0.15.0:compile
> [INFO] |  |  \- org.kitesdk:kite-hadoop-compatibility:jar:0.15.0:compile
> [INFO] |  +- 
> org.kitesdk:kite-morphlines-hadoop-parquet-avro:jar:0.15.0:compile
> [INFO] |  |  \- com.twitter:parquet-avro:jar:1.4.1:compile
> [INFO] |  | +- com.twitter:parquet-column:jar:1.4.1:compile
> [INFO] |  | |  +- com.twitter:parquet-common:jar:1.4.1:compile
> [INFO] |  | |  \- com.twitter:parquet-encoding:jar:1.4.1:compile
> [INFO] |  | | \- com.twitter:parquet-g

Re: Getting java.lang.OutOfMemoryError: GC overhead limit exceeded with Morphline Interceptor

2014-06-18 Thread Wolfgang Hoschek
The default memory settings for flume are extremely low. Try giving it more 
Java memory.

On Jun 18, 2014, at 12:57 PM, David Gates  wrote:

> When running a test to see if I can get Morphline Interceptor working to
> convert some timestamps in logfiles, I am getting a java GC overhead limit
> exceeded error.
> 
> Command:
> 
> flume-ng agent --conf ./conf -f testflume.conf
> -Dflume.root.logger=DBUG,console -n agent
> 
> testflume.conf:
> 
> agent.channels.memory-channel.type = memory
> agent.sources.spool-source.type = spooldir
> agent.sources.spool-source.spoolDir = /home/impala/spool/
> agent.sources.spool-source.channels = memory-channel
> agent.sources.spool-source.interceptors = morphlineinterceptor
> agent.sources.spool-source.interceptors.morphlineinterceptor.type =
> org.apache.flume.sink.solr.morphline.MorphlineInterceptor$Builder
> agent.sources.spool-source.interceptors.morphlineinterceptor.morphlineFile
> = /root/morphline.conf
> agent.sources.spool-source.interceptors.morphlineinterceptor.morphlineId =
> morphline1
> agent.sinks.hdfs-sink.channel = memory-channel
> agent.sinks.hdfs-sink.type = hdfs
> agent.sinks.hdfs-sink.hdfs.path = /user/impala/
> agent.sinks.hdfs-sink.hdfs.fileType = DataStream
> agent.channels = memory-channel
> agent.sources = spool-source
> agent.sinks = hdfs-sink
> 
> morphline.conf:
> 
> morphlines : [
>{
>id : morphline1
>importCommands : ["com.cloudera.**"]
> 
>commands : [
>{
>readCSV {
>separator: ";"
>trim: true
>columns:
> [Header1,Header2,Header3,ConnectionType,SessionID,ReleaseCause,StartTime,AnswerTime,ReleaseTime,MinutesWest,ReleaseCauseProto,ReleaseCauseNum,FirstReleaseDialogue,TrunkIDOrig,VOIPProtoOrig,SourceNumOrig,SourceHostOrig,DestNumOrig,DestHostOrig,OrigCallID,OrigRemotePayloadIP,OrigRemotePayloadUDP,OrigLocalPayloadIP,OrigLocalPayloadUDP,OrigCodecList,OrigIngressPackets,OrigEgressPackets,OrigIngressOctets,OrigEgressOctets,OrigIngressPacketLoss,OrigIngressDelay,OrigIngressJitter,TrunkIDTerm,VOIPProtoTerm,SourceNumTerm,SourceHostTerm,DestNumTerm,DestHostTerm,TermCallID,TermRemotePayloadIP,TermRemotePayloadUDP,TermLocalPayloadIP,TermLocalPayloadUDP,TermCodecList,TermIngressPackets,TermEgressPackets,TermIngressOctets,TermEgressOctets,TermIngpressPacketLoss,TermIngressDelay,TermIngressJitter,FinalRouteIndication,RoutingDigits,CallDurSec,PostDialDelaySec,RingTimeSec,DurMiniSec,ConfID,RPIDANI,RouteEntryIndex,RouteTable,LNPDip,IngressLRN,EgressLRN,CNAMDip,DNCDip,OrigTrunkAlias,TermTrunkAlias,ERSDip,OLIDigits]
>}
>}
>]
>}
> ]
> 
> 
> I had originally also had a convertTimeStamp command in there but removed
> it to troubleshoot.
> 
> The error I get when running is as follows:
> 
> 14/05/21 08:37:23 INFO api.MorphlineContext: Importing commands
> 14/05/21 08:37:31 ERROR node.PollingPropertiesFileConfigurationProvider:
> Unhandled error
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> 
> 
> Ive tried googling but I can't find anything specific to flume/morphline
> with GC limit exceeded, any help/ideas would be appreciated.



[jira] [Commented] (FLUME-2392) Flume MorphlineSink can not work because of java.lang.NoClassDefFoundError: org/kitesdk/morphline/api/MorphlineCompilationException

2014-05-29 Thread wolfgang hoschek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012197#comment-14012197
 ] 

wolfgang hoschek commented on FLUME-2392:
-

The dependencies have been made “optional” in 
flume-ng-sinks/flume-ng-morphline-solr-sink/pom.xml via 
true, thus the dependencies don’t ship automatically with 
the build.

Here is a flume-centric way of getting hold of all the jars and plugging them 
into flume: 
https://groups.google.com/a/cloudera.org/d/msg/cdk-dev/7T4pTebdWN4/sBHGkoS70LkJ


> Flume MorphlineSink can not work because of java.lang.NoClassDefFoundError: 
> org/kitesdk/morphline/api/MorphlineCompilationException
> ---
>
> Key: FLUME-2392
> URL: https://issues.apache.org/jira/browse/FLUME-2392
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Affects Versions: v1.5.0
>Reporter: liyunzhang
> Fix For: v1.6.0
>
>
> Test Flume+Solr, use flume 1.5 + solr 4.6.
> You can reproduce it by following steps
> 1. Download 
> http://www.apache.org/dyn/closer.cgi/flume/1.5.0/apache-flume-1.5.0-bin.tar.gz
> 2. install: extract  to /usr/lib/apache-flume-1.5.0-bin
> 3. create 
> /usr/lib/apache-flume-1.5.0-bin/conf/flume-conf-morphlineSolr.properties
>  #cat /usr/lib/apache-flume-1.5.0-bin/conf/flume-conf-morphlineSolr.properties
> a1.channels = c1
> a1.sources = r1
> a1.sinks = k1
> a1.channels.c1.type = memory
> a1.sources.r1.channels = c1
> a1.sources.r1.type = exec
> a1.sources.r1.command = tail -F /var/log/a1.new.log
> a1.sinks.k1.channel = c1
> a1.sinks.k1.type = org.apache.flume.sink.solr.morphline.MorphlineSolrSink
> a1.sinks.k1.morphlineFile = 
> /usr/lib/apache-flume-1.5.0-bin/conf/morphline.conf
> a1.channels.MemChannel.type = memory
> a1.channels.MemChannel.capacity = 1
> a1.channels.MemChannel.transactionCapacity = 100
> 4. create /usr/lib/apache-flume-1.5.0-bin/conf/morphline.conf
> #cat /usr/lib/apache-flume-1.5.0-bin/conf/morphline.conf
> morphlines : [
> {
>  # Name used to identify a morphline. E.g. used if there are multiple
>  # morphlines in a morphline config file
>  id : morphline1
>  # Import all morphline commands in these java packages and their
> # subpackages. Other commands that may be present on the classpath are
>  # not visible to this morphline.
>  importCommands : ["com.cloudera.**", "org.apache.solr.**"]
>  
> commands : [
>   {
> # Parse input attachment and emit a record for each input line
> readLine {
>   charset : UTF-8
> }
>   }
>   {
> grok {
>   # Consume the output record of the previous command and pipe another
>   # record downstream.
>   #
>   # A grok-dictionary is a config file that contains prefabricated
>   # regular expressions that can be referred to by name. grok patterns
>   # specify such a regex name, plus an optional output field name.
>   # The syntax is %{REGEX_NAME:OUTPUT_FIELD_NAME}
>   # The input line is expected in the "message" input field.
>   #dictionaryFiles : [src/test/resources/grok-dictionaries]
>   dictionaryFiles : [/etc/flume/conf/grok-dictionaries]
>   expressions : {
> message : """%{TIMESTAMP_LOG:timestamp} %{LOGLEVEL:loglevel} 
> %{DATA:classname}: %{GREEDYDATA:msg}"""
>   }
> }
>   }
>   # Consume the output record of the previous command, convert
>   # the timestamp, and pipe another record downstream.
>   #
>   # convert timestamp field to native Solr timestamp format
>   # e.g. 2012-09-06T07:14:34Z to 2012-09-06T07:14:34.000Z
>   {
> convertTimestamp {
>   field : timestamp
>   inputFormats : ["-MM-dd'T'HH:mm:ss.SSS'Z'", "-MM-dd 
> HH:mm:ss,SSS"]
>   inputTimezone : America/Los_Angeles
>   outputFormat : "-MM-dd'T'HH:mm:ss.SSS'Z'"
>   outputTimezone : UTC
> }
>   }
>   {
> generateUUID {
>field : id
> }
>   }
>   # Consume the output record of the previous command, transform it
>   # and pipe the record downstream.
>   #
>   # This command deletes record fields that are unknown to Solr
> 

[jira] [Resolved] (FLUME-2392) Flume MorphlineSink can not work because of java.lang.NoClassDefFoundError: org/kitesdk/morphline/api/MorphlineCompilationException

2014-05-29 Thread wolfgang hoschek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wolfgang hoschek resolved FLUME-2392.
-

Resolution: Won't Fix

> Flume MorphlineSink can not work because of java.lang.NoClassDefFoundError: 
> org/kitesdk/morphline/api/MorphlineCompilationException
> ---
>
> Key: FLUME-2392
> URL: https://issues.apache.org/jira/browse/FLUME-2392
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Affects Versions: v1.5.0
>Reporter: liyunzhang
> Fix For: v1.6.0
>
>
> Test Flume+Solr, use flume 1.5 + solr 4.6.
> You can reproduce it by following steps
> 1. Download 
> http://www.apache.org/dyn/closer.cgi/flume/1.5.0/apache-flume-1.5.0-bin.tar.gz
> 2. install: extract  to /usr/lib/apache-flume-1.5.0-bin
> 3. create 
> /usr/lib/apache-flume-1.5.0-bin/conf/flume-conf-morphlineSolr.properties
>  #cat /usr/lib/apache-flume-1.5.0-bin/conf/flume-conf-morphlineSolr.properties
> a1.channels = c1
> a1.sources = r1
> a1.sinks = k1
> a1.channels.c1.type = memory
> a1.sources.r1.channels = c1
> a1.sources.r1.type = exec
> a1.sources.r1.command = tail -F /var/log/a1.new.log
> a1.sinks.k1.channel = c1
> a1.sinks.k1.type = org.apache.flume.sink.solr.morphline.MorphlineSolrSink
> a1.sinks.k1.morphlineFile = 
> /usr/lib/apache-flume-1.5.0-bin/conf/morphline.conf
> a1.channels.MemChannel.type = memory
> a1.channels.MemChannel.capacity = 1
> a1.channels.MemChannel.transactionCapacity = 100
> 4. create /usr/lib/apache-flume-1.5.0-bin/conf/morphline.conf
> #cat /usr/lib/apache-flume-1.5.0-bin/conf/morphline.conf
> morphlines : [
> {
>  # Name used to identify a morphline. E.g. used if there are multiple
>  # morphlines in a morphline config file
>  id : morphline1
>  # Import all morphline commands in these java packages and their
> # subpackages. Other commands that may be present on the classpath are
>  # not visible to this morphline.
>  importCommands : ["com.cloudera.**", "org.apache.solr.**"]
>  
> commands : [
>   {
> # Parse input attachment and emit a record for each input line
> readLine {
>   charset : UTF-8
> }
>   }
>   {
> grok {
>   # Consume the output record of the previous command and pipe another
>   # record downstream.
>   #
>   # A grok-dictionary is a config file that contains prefabricated
>   # regular expressions that can be referred to by name. grok patterns
>   # specify such a regex name, plus an optional output field name.
>   # The syntax is %{REGEX_NAME:OUTPUT_FIELD_NAME}
>   # The input line is expected in the "message" input field.
>   #dictionaryFiles : [src/test/resources/grok-dictionaries]
>   dictionaryFiles : [/etc/flume/conf/grok-dictionaries]
>   expressions : {
> message : """%{TIMESTAMP_LOG:timestamp} %{LOGLEVEL:loglevel} 
> %{DATA:classname}: %{GREEDYDATA:msg}"""
>   }
> }
>   }
>   # Consume the output record of the previous command, convert
>   # the timestamp, and pipe another record downstream.
>   #
>   # convert timestamp field to native Solr timestamp format
>   # e.g. 2012-09-06T07:14:34Z to 2012-09-06T07:14:34.000Z
>   {
> convertTimestamp {
>   field : timestamp
>   inputFormats : ["-MM-dd'T'HH:mm:ss.SSS'Z'", "-MM-dd 
> HH:mm:ss,SSS"]
>   inputTimezone : America/Los_Angeles
>   outputFormat : "-MM-dd'T'HH:mm:ss.SSS'Z'"
>   outputTimezone : UTC
> }
>   }
>   {
> generateUUID {
>field : id
> }
>   }
>   # Consume the output record of the previous command, transform it
>   # and pipe the record downstream.
>   #
>   # This command deletes record fields that are unknown to Solr
>   # schema.xml. Recall that Solr throws an exception on any attempt to
>   # load a document that contains a field that isn't specified in
>   # schema.xml.
>   {
> sanitizeUnknownSolrFields {
>   # Location from which to fetch Solr schema
>   solrLocator : {
> collection : collection1   # Name of solr collection
> 

Re: [VOTE] Apache Flume 1.5.0 RC1

2014-05-18 Thread Wolfgang Hoschek
+1 Looks good to me!

Wolfgang Hoschek

On May 17, 2014, at 8:47 PM, Hari Shreedharan  wrote:

> Flume PMC - Please vote!
> 
> 
> On Wed, May 14, 2014 at 12:31 AM, Arvind Prabhakar wrote:
> 
>> +1
>> 
>> * Verified signatures and checksums for both binary and source tarballs
>> * Rat check looks good on source tarball
>> * Nit: Notice file has dated header, needs to be updated but not a blocker
>> 
>> Regards,
>> Arvind Prabhakar
>> 
>> 
>> On Wed, May 7, 2014 at 3:28 PM, Hari Shreedharan
>> wrote:
>> 
>>> This is a vote for the next release of Apache Flume, version 1.5.0. We
>> are
>>> voting on release candidate RC1.
>>> 
>>> It fixes the following issues:
>>>  http://s.apache.org/4eQ
>>> 
>>> *** Please cast your vote within the next 72 hours ***
>>> 
>>> The tarball (*.tar.gz), signature (*.asc), and checksums (*.md5,
>>> *.sha1) for the source and binary artifacts can be found here:
>>>   https://people.apache.org/~hshreedharan/apache-flume-1.5.0-rc1/
>>> 
>>> Maven staging repo:
>>> 
>> https://repository.apache.org/content/repositories/orgapacheflume-1001/
>>> 
>>> 
>>> The tag to be voted on:
>>> 
>>> 
>>> 
>> https://git-wip-us.apache.org/repos/asf?p=flume.git;a=commit;h=8633220df808c4cd0c13d1cf0320454a94f1ea97
>>> 
>>> Flume's KEYS file containing PGP keys we use to sign the release:
>>>  http://www.apache.org/dist/flume/KEYS
>>> 
>>> 
>>> Thanks,
>>> Hari
>>> 
>> 



Re: Flume Jambalaya - A Flume Plugin with Multiple Components

2014-05-02 Thread Wolfgang Hoschek
My sense is that a) is interesting if it evolves into a capable true native 
tailer, whereas b) is already available in flume and c) and d) are already 
available in flume via the MorphlineInterceptor

Wolfgang.

On May 3, 2014, at 12:18 AM, Israel Ekpo  wrote:

> Flume Community,
> 
> I created a Flume Plugin with multiple components that complements the 
> current version of Apache Flume.
> 
> This was necessary as part of a personal project as I working on.
> 
> It is code named - Flume Jambalaya
> 
> Jambalaya is a standalone Apache Flume plugin that contains a variety of 
> sources, interceptors, channels, sinks, serializers and other components 
> designed to extend the Flume architecture. It has been released under the 
> Apache License version 2.0
> 
> https://github.com/aicer/flume-jambalaya
> 
> It currently contains:
> 
> (a) File Source - This source lets you ingest data by tailing files from a 
> specific path
> (b) ElasticSearch HTTP Sink - This sink sends events to an ElasticSearch 
> cluster via HTTP with no dependency on the ElasticSearch versions between 
> Flume and the Server cluster.
> (c) DateInterceptor - The date interceptor is used for parsing dates from 
> fields and using that date or timestamp as the timestamp for the Flume event.
> (d) Grok Interceptor - allows you to extract structured data from 
> unstructured text and inject them as headers into the event
> 
> Sample configuration files are available here
> 
> https://github.com/aicer/flume-jambalaya/tree/master/sample-configuration-files
> 
> I did not realize that the Flume trunk already has a HTTP Sink for 
> ElasticSearch so you can decide whether or not to use the sink that comes 
> with it
> 
> I am still testing and integrating the various components.
> 
> Please check it out when you get a chance and send me some feedback
> 
> Thanks.
> 



[jira] [Resolved] (FLUME-2317) Upgrade Dataset sink to kite-0.12.0

2014-03-13 Thread wolfgang hoschek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wolfgang hoschek resolved FLUME-2317.
-

Resolution: Duplicate

Dupe of https://issues.apache.org/jira/browse/FLUME-2345


> Upgrade Dataset sink to kite-0.12.0
> ---
>
> Key: FLUME-2317
> URL: https://issues.apache.org/jira/browse/FLUME-2317
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>        Reporter: wolfgang hoschek
> Fix For: v1.5.0
>
> Attachments: FLUME-2317-v2.patch
>
>
> Release notes are here: http://kitesdk.org/docs/current/release_notes.html



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (FLUME-2316) Upgrade MorphlineSolrSink to kite-0.12.0

2014-03-13 Thread wolfgang hoschek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wolfgang hoschek resolved FLUME-2316.
-

Resolution: Fixed

Already fixed by https://issues.apache.org/jira/browse/FLUME-2345

> Upgrade MorphlineSolrSink to kite-0.12.0
> 
>
> Key: FLUME-2316
> URL: https://issues.apache.org/jira/browse/FLUME-2316
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>    Reporter: wolfgang hoschek
> Fix For: v1.5.0
>
> Attachments: FLUME-2316-v3.patch
>
>
> Release notes are here: http://kitesdk.org/docs/current/release_notes.html



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Flume 1.5

2014-03-13 Thread Wolfgang Hoschek
These patches are also ready to go:

https://issues.apache.org/jira/browse/FLUME-2316
https://issues.apache.org/jira/browse/FLUME-2330

Wolfgang.

On Mar 13, 2014, at 1:35 PM, Hari Shreedharan  wrote:

> There are a couple patches - the bidirectional HTTP source handler and the
> elastic search HTTP api update which might take some time to get committed.
> I am inclined to wait for these two before rolling the RC. So once these
> are in, I will roll the RC. This might mean a delay of some time, but these
> are good features to pull in.
> 
> 
> Hari
> 
> 
> On Wed, Mar 12, 2014 at 7:10 AM, Alexandre DUTRA  wrote:
> 
>> This one has been open for a very long time and has a patch available:
>> 
>> FLUME-2215
>> 
>> Regards,
>> 
>> Alexandre
>> 
>> Le mercredi 12 mars 2014, Ashish  a écrit :
>> 
>>> If possible, can you please take a look at following and include whatever
>>> is possible
>>> 
>>> Doc Patches
>>> ---
>>> https://issues.apache.org/jira/browse/FLUME-2237
>>> https://issues.apache.org/jira/browse/FLUME-2024
>>> https://issues.apache.org/jira/browse/FLUME-1521
>>> 
>>> Code Patches
>>> 
>>> https://issues.apache.org/jira/browse/FLUME-2186 (almost done, might
>> need
>>> a
>>> relook)
>>> https://issues.apache.org/jira/browse/FLUME-2006 (Review pending)
>>> https://issues.apache.org/jira/browse/FLUME-1710 (Minor, Review pending)
>>> https://issues.apache.org/jira/browse/FLUME-1501 (Review done, need to
>> be
>>> committed)
>>> https://issues.apache.org/jira/browse/FLUME-1491 (Major one, may not be
>>> able to make it to 1.5, but worth a look)
>>> https://issues.apache.org/jira/browse/FLUME-1365 (May be a dup, need
>>> someone to confirm)
>>> https://issues.apache.org/jira/browse/FLUME-1281
>>> 
>>> thanks
>>> ashish
>>> 
>>> 
>>> 
>>> On Wed, Mar 12, 2014 at 12:35 AM, Hari Shreedharan <
>>> hshreedha...@cloudera.com > wrote:
>>> 
 Hi guys,
 
 I am planning to spin an RC for Flume 1.5 late this week. If you have
 patches that you want in 1.5, reply to this thread. I will try and
>>> review it
 
 
 Thanks,
 Hari
 
 
>>> 
>>> 
>>> --
>>> thanks
>>> ashish
>>> 
>>> Blog: http://www.ashishpaliwal.com/blog
>>> My Photo Galleries: http://www.pbase.com/ashishpaliwal
>>> 
>> 



[jira] [Commented] (FLUME-2330) Remove the MorphlineHandlerImpl configuration option from MorphlineSink

2014-03-12 Thread wolfgang hoschek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13932736#comment-13932736
 ] 

wolfgang hoschek commented on FLUME-2330:
-

After thinking about this some more, I attached a patch that removes the 
handlerClass configuration option completely because I don't think anyone uses 
this configuration feature.

> Remove the MorphlineHandlerImpl configuration option from MorphlineSink
> ---
>
> Key: FLUME-2330
> URL: https://issues.apache.org/jira/browse/FLUME-2330
> Project: Flume
>  Issue Type: Bug
>Affects Versions: v1.4.0
>Reporter: Hari Shreedharan
> Fix For: v1.5.0
>
> Attachments: FLUME-2330-v1.patch
>
>
> Removing this would be incompatible, but it is best to not have this option 
> open to users. Let's remove this from the docs so no new users use it while 
> it does not break existing ones who may be using it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (FLUME-2330) Remove the MorphlineHandlerImpl configuration option from MorphlineSink

2014-03-12 Thread wolfgang hoschek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wolfgang hoschek updated FLUME-2330:


Affects Version/s: v1.4.0
Fix Version/s: v1.5.0

> Remove the MorphlineHandlerImpl configuration option from MorphlineSink
> ---
>
> Key: FLUME-2330
> URL: https://issues.apache.org/jira/browse/FLUME-2330
> Project: Flume
>  Issue Type: Bug
>Affects Versions: v1.4.0
>Reporter: Hari Shreedharan
> Fix For: v1.5.0
>
> Attachments: FLUME-2330-v1.patch
>
>
> Removing this would be incompatible, but it is best to not have this option 
> open to users. Let's remove this from the docs so no new users use it while 
> it does not break existing ones who may be using it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (FLUME-2330) Remove the MorphlineHandlerImpl configuration option from MorphlineSink

2014-03-12 Thread wolfgang hoschek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wolfgang hoschek updated FLUME-2330:


Attachment: FLUME-2330-v1.patch

> Remove the MorphlineHandlerImpl configuration option from MorphlineSink
> ---
>
> Key: FLUME-2330
> URL: https://issues.apache.org/jira/browse/FLUME-2330
> Project: Flume
>  Issue Type: Bug
>Reporter: Hari Shreedharan
> Attachments: FLUME-2330-v1.patch
>
>
> Removing this would be incompatible, but it is best to not have this option 
> open to users. Let's remove this from the docs so no new users use it while 
> it does not break existing ones who may be using it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (FLUME-2317) Upgrade Dataset sink to kite-0.12.0

2014-03-11 Thread wolfgang hoschek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wolfgang hoschek updated FLUME-2317:


Attachment: FLUME-2317-v2.patch

> Upgrade Dataset sink to kite-0.12.0
> ---
>
> Key: FLUME-2317
> URL: https://issues.apache.org/jira/browse/FLUME-2317
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>        Reporter: wolfgang hoschek
> Fix For: v1.5.0
>
> Attachments: FLUME-2317-v2.patch
>
>
> Release notes are here: http://kitesdk.org/docs/current/release_notes.html



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (FLUME-2317) Upgrade Dataset sink to kite-0.12.0

2014-03-11 Thread wolfgang hoschek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wolfgang hoschek updated FLUME-2317:


Attachment: (was: FLUME-2317-v1.patch)

> Upgrade Dataset sink to kite-0.12.0
> ---
>
> Key: FLUME-2317
> URL: https://issues.apache.org/jira/browse/FLUME-2317
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>        Reporter: wolfgang hoschek
> Fix For: v1.5.0
>
> Attachments: FLUME-2317-v2.patch
>
>
> Release notes are here: http://kitesdk.org/docs/current/release_notes.html



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (FLUME-2317) Upgrade Dataset sink to kite-0.12.0

2014-03-11 Thread wolfgang hoschek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wolfgang hoschek updated FLUME-2317:


Summary: Upgrade Dataset sink to kite-0.12.0  (was: Upgrade Dataset sink to 
kite-0.11.0)

> Upgrade Dataset sink to kite-0.12.0
> ---
>
> Key: FLUME-2317
> URL: https://issues.apache.org/jira/browse/FLUME-2317
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>        Reporter: wolfgang hoschek
> Fix For: v1.5.0
>
> Attachments: FLUME-2317-v1.patch
>
>
> Release notes are here: http://kitesdk.org/docs/current/release_notes.html



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (FLUME-2316) Upgrade MorphlineSolrSink to kite-0.12.0

2014-03-11 Thread wolfgang hoschek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wolfgang hoschek updated FLUME-2316:


Attachment: (was: FLUME-2316-v2.patch)

> Upgrade MorphlineSolrSink to kite-0.12.0
> 
>
> Key: FLUME-2316
> URL: https://issues.apache.org/jira/browse/FLUME-2316
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>    Reporter: wolfgang hoschek
> Fix For: v1.5.0
>
> Attachments: FLUME-2316-v3.patch
>
>
> Release notes are here: http://kitesdk.org/docs/current/release_notes.html



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (FLUME-2316) Upgrade MorphlineSolrSink to kite-0.12.0

2014-03-11 Thread wolfgang hoschek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wolfgang hoschek updated FLUME-2316:


Attachment: FLUME-2316-v3.patch

> Upgrade MorphlineSolrSink to kite-0.12.0
> 
>
> Key: FLUME-2316
> URL: https://issues.apache.org/jira/browse/FLUME-2316
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>    Reporter: wolfgang hoschek
> Fix For: v1.5.0
>
> Attachments: FLUME-2316-v3.patch
>
>
> Release notes are here: http://kitesdk.org/docs/current/release_notes.html



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (FLUME-2316) Upgrade MorphlineSolrSink to kite-0.12.0

2014-03-11 Thread wolfgang hoschek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wolfgang hoschek updated FLUME-2316:


Summary: Upgrade MorphlineSolrSink to kite-0.12.0  (was: Upgrade 
MorphlineSolrSink to kite-0.11.0)

> Upgrade MorphlineSolrSink to kite-0.12.0
> 
>
> Key: FLUME-2316
> URL: https://issues.apache.org/jira/browse/FLUME-2316
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>    Reporter: wolfgang hoschek
> Fix For: v1.5.0
>
> Attachments: FLUME-2316-v3.patch
>
>
> Release notes are here: http://kitesdk.org/docs/current/release_notes.html



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (FLUME-2340) Refactor to make room for Morphlines Elasticsearch Sink

2014-03-05 Thread wolfgang hoschek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13921192#comment-13921192
 ] 

wolfgang hoschek commented on FLUME-2340:
-

[~otis] Once someone put together the above parts we could take the next step - 
refactoring to have three flume maven modules: one module as a base, another 
module for solr dependencies, and yet another module for ES dependencies:

- Add a new flume module flume-ng-morphline-sink that's a copy n paste of 
flume-ng-morphline-solr-sink except that it has a minimum of dependencies: 
depends on kite-morphlines-core instead of kite-morphlines-all, and in 
particular doesn't contain tests and classes that depend on solr (or ES). E.g. 
it contains MorphlineSink.java and MorphlineInterceptor.java but not 
MorphlineSolrSink.java.
- Add flume-ng-morphline-elasticsearch-sink that depends on 
flume-ng-morphline-sink and also kite-morphlines-all-except-solr, plus 
kite-morphlines-elasticsearch. This means you can use class MorphlineSink from 
flume-ng-morphline-sink as is for ES.
- Change flume-ng-morphline-solr-sink to depend on flume-ng-morphline-sink and 
also kite-morphlines-all. Retain backwards compat by retaining existing classes 
in flume-ng-morphline-solr-sink, yet pointing them to extend or use relevant 
things in flume-ng-morphline-sink

> Refactor to make room for Morphlines Elasticsearch Sink
> ---
>
> Key: FLUME-2340
> URL: https://issues.apache.org/jira/browse/FLUME-2340
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Reporter: Otis Gospodnetic
> Fix For: v1.5.0
>
>
> Right now there are some non-Solr-specific classes in 
> org.apache.flume.sink.solr.morphline  and everything assumes data will get 
> loaded into Solr.  This should be refactored to make it possible to use 
> Morphlines and send data to Elasticsearch, too, for example.
> See 
> http://search-hadoop.com/m/Jrb3G1tSCQK1&subj=Re+Questions+about+Morphline+Solr+Sink+structure



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (FLUME-2340) Refactor to make room for Morphlines Elasticsearch Sink

2014-03-03 Thread wolfgang hoschek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13918531#comment-13918531
 ] 

wolfgang hoschek commented on FLUME-2340:
-

Here is what enabling ES would involve:

- Add a loadElasticSearch command in a corresponding 
kite-morphlines-elasticsearch maven module
- change kite-morphlines-all to kite-morphlines-all-except-solr in 
flume-ng-sinks/flume-ng-morphline-solr-sink/pom.xml
- add any necessary ES dependency jars to pom.xml
- add unit tests and integration tests

This doesn't require yet another sink - ES can be enabled with these packaging 
changes. 

> Refactor to make room for Morphlines Elasticsearch Sink
> ---
>
> Key: FLUME-2340
> URL: https://issues.apache.org/jira/browse/FLUME-2340
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Reporter: Otis Gospodnetic
> Fix For: v1.5.0
>
>
> Right now there are some non-Solr-specific classes in 
> org.apache.flume.sink.solr.morphline  and everything assumes data will get 
> loaded into Solr.  This should be refactored to make it possible to use 
> Morphlines and send data to Elasticsearch, too, for example.
> See 
> http://search-hadoop.com/m/Jrb3G1tSCQK1&subj=Re+Questions+about+Morphline+Solr+Sink+structure



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (FLUME-2329) Add an alias for the Morphline Solr Sink

2014-02-20 Thread wolfgang hoschek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907692#comment-13907692
 ] 

wolfgang hoschek commented on FLUME-2329:
-

Looks like a patch for a different JIRA got accidentally attached. Pls check.

> Add an alias for the Morphline Solr Sink
> 
>
> Key: FLUME-2329
> URL: https://issues.apache.org/jira/browse/FLUME-2329
> Project: Flume
>  Issue Type: Bug
>Reporter: Hari Shreedharan
>Assignee: Hari Shreedharan
> Attachments: FLUME-2324.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (FLUME-2323) Morphline sink must increment eventDrainAttemptCount when it takes event from channel

2014-02-15 Thread wolfgang hoschek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902565#comment-13902565
 ] 

wolfgang hoschek commented on FLUME-2323:
-

+1 Looks good to me.

> Morphline sink must increment eventDrainAttemptCount when it takes event from 
> channel
> -
>
> Key: FLUME-2323
> URL: https://issues.apache.org/jira/browse/FLUME-2323
> Project: Flume
>  Issue Type: Bug
>Reporter: Hari Shreedharan
>Assignee: Hari Shreedharan
> Attachments: FLUME-2323.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: Event header validation using interceptors

2014-02-13 Thread Wolfgang Hoschek
The morphline interceptor puts all flume events headers plus the flume event 
body into the input morphline record, so morphline commands can match on the 
entire flume event.

Wolfgang.

On Feb 13, 2014, at 9:06 PM, Jeff Lord wrote:

> Wolfgang,
> 
> Will the morphline interceptor + grok actually match event headers or
> just the event body?
> 
> -Jeff
> 
> On Thu, Feb 13, 2014 at 10:05 AM, Wolfgang Hoschek
>  wrote:
>> You could probably do this with a MorphlineInterceptor, e.g. via using the 
>> grok command in combination with the tryCatch command.
>> 
>> http://flume.apache.org/FlumeUserGuide.html#morphline-interceptor
>> http://kitesdk.org/docs/current/kite-morphlines/index.html
>> http://kitesdk.org/docs/current/kite-morphlines/morphlinesReferenceGuide.html#grok
>> http://kitesdk.org/docs/current/kite-morphlines/morphlinesReferenceGuide.html#tryRules
>> 
>> Wolfgang.
>> 
>> On Feb 13, 2014, at 7:58 PM, Nikolaos Tsipas wrote:
>> 
>>> Hello,
>>> 
>>> We have a use case that requires the validation of headers on events 
>>> received by an avro source in order to consider an event as valid or 
>>> invalid. If an event is invalid then it should be routed to a different 
>>> channel.
>>> 
>>> We know how to route events based on the values of specific headers using 
>>> multiplexing. However, for the regex validation of headers flume doesn't 
>>> seem to provide any appropriate interceptors.
>>> 
>>> For this reason, we are thinking to create a new interceptor that would 
>>> allow regex validation of headers and depending on the outcome a header 
>>> would be added (e.g. valid = true)
>>> 
>>> Questions:
>>> 
>>> * Does the above sound like a reasonable solution for what we want to 
>>> achieve?
>>> * What would be the best way to implement it in order to be beneficial for 
>>> the flume community? Extend the functionality of one of the existing 
>>> interceptors (e.g. RegexFilteringInterceptor) or provide a new one?
>>> 
>>> Regards,
>>> Nikolaos
>>> 
>>> 
>>> 
>>> 
>>> 
>>> http://www.bbc.co.uk
>>> This e-mail (and any attachments) is confidential and may contain personal 
>>> views which are not the views of the BBC unless specifically stated.
>>> If you have received it in error, please delete it from your system.
>>> Do not use, copy or disclose the information in any way nor act in reliance 
>>> on it and notify the sender immediately.
>>> Please note that the BBC monitors e-mails sent or received.
>>> Further communication will signify your consent to this.
>>> 
>>> -
>> 



Re: Event header validation using interceptors

2014-02-13 Thread Wolfgang Hoschek
You could probably do this with a MorphlineInterceptor, e.g. via using the grok 
command in combination with the tryCatch command. 

http://flume.apache.org/FlumeUserGuide.html#morphline-interceptor
http://kitesdk.org/docs/current/kite-morphlines/index.html
http://kitesdk.org/docs/current/kite-morphlines/morphlinesReferenceGuide.html#grok
http://kitesdk.org/docs/current/kite-morphlines/morphlinesReferenceGuide.html#tryRules

Wolfgang.

On Feb 13, 2014, at 7:58 PM, Nikolaos Tsipas wrote:

> Hello,
> 
> We have a use case that requires the validation of headers on events received 
> by an avro source in order to consider an event as valid or invalid. If an 
> event is invalid then it should be routed to a different channel.
> 
> We know how to route events based on the values of specific headers using 
> multiplexing. However, for the regex validation of headers flume doesn't seem 
> to provide any appropriate interceptors.
> 
> For this reason, we are thinking to create a new interceptor that would allow 
> regex validation of headers and depending on the outcome a header would be 
> added (e.g. valid = true)
> 
> Questions:
> 
> * Does the above sound like a reasonable solution for what we want to achieve?
> * What would be the best way to implement it in order to be beneficial for 
> the flume community? Extend the functionality of one of the existing 
> interceptors (e.g. RegexFilteringInterceptor) or provide a new one?
> 
> Regards,
> Nikolaos
> 
> 
> 
> 
> 
> http://www.bbc.co.uk
> This e-mail (and any attachments) is confidential and may contain personal 
> views which are not the views of the BBC unless specifically stated.
> If you have received it in error, please delete it from your system.
> Do not use, copy or disclose the information in any way nor act in reliance 
> on it and notify the sender immediately.
> Please note that the BBC monitors e-mails sent or received.
> Further communication will signify your consent to this.
> 
> -



[jira] [Updated] (FLUME-2317) Upgrade Dataset sink to kite-0.11.0

2014-02-08 Thread wolfgang hoschek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wolfgang hoschek updated FLUME-2317:


Attachment: FLUME-2317-v1.patch

> Upgrade Dataset sink to kite-0.11.0
> ---
>
> Key: FLUME-2317
> URL: https://issues.apache.org/jira/browse/FLUME-2317
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>        Reporter: wolfgang hoschek
> Fix For: v1.5.0
>
> Attachments: FLUME-2317-v1.patch
>
>
> Release notes are here: http://kitesdk.org/docs/current/release_notes.html



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (FLUME-2317) Upgrade Dataset sink to kite-0.11.0

2014-02-08 Thread wolfgang hoschek (JIRA)
wolfgang hoschek created FLUME-2317:
---

 Summary: Upgrade Dataset sink to kite-0.11.0
 Key: FLUME-2317
 URL: https://issues.apache.org/jira/browse/FLUME-2317
 Project: Flume
  Issue Type: Improvement
  Components: Sinks+Sources
Reporter: wolfgang hoschek
 Fix For: v1.5.0


Release notes are here: http://kitesdk.org/docs/current/release_notes.html




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (FLUME-2316) Upgrade MorphlineSolrSink to kite-0.11.0

2014-02-08 Thread wolfgang hoschek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wolfgang hoschek updated FLUME-2316:


Attachment: FLUME-2316-v2.patch

> Upgrade MorphlineSolrSink to kite-0.11.0
> 
>
> Key: FLUME-2316
> URL: https://issues.apache.org/jira/browse/FLUME-2316
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>    Reporter: wolfgang hoschek
> Fix For: v1.5.0
>
> Attachments: FLUME-2316-v2.patch
>
>
> Release notes are here: http://kitesdk.org/docs/current/release_notes.html



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (FLUME-2316) Upgrade MorphlineSolrSink to kite-0.11.0

2014-02-08 Thread wolfgang hoschek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wolfgang hoschek updated FLUME-2316:


Attachment: (was: FLUME-2316-v1.patch)

> Upgrade MorphlineSolrSink to kite-0.11.0
> 
>
> Key: FLUME-2316
> URL: https://issues.apache.org/jira/browse/FLUME-2316
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>    Reporter: wolfgang hoschek
> Fix For: v1.5.0
>
> Attachments: FLUME-2316-v2.patch
>
>
> Release notes are here: http://kitesdk.org/docs/current/release_notes.html



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (FLUME-2316) Upgrade MorphlineSolrSink to kite-0.11.0

2014-02-08 Thread wolfgang hoschek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wolfgang hoschek updated FLUME-2316:


Attachment: FLUME-2316-v1.patch

> Upgrade MorphlineSolrSink to kite-0.11.0
> 
>
> Key: FLUME-2316
> URL: https://issues.apache.org/jira/browse/FLUME-2316
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>    Reporter: wolfgang hoschek
> Fix For: v1.5.0
>
> Attachments: FLUME-2316-v1.patch
>
>
> Release notes are here: http://kitesdk.org/docs/current/release_notes.html



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (FLUME-2316) Upgrade MorphlineSolrSink to kite-0.11.0

2014-02-08 Thread wolfgang hoschek (JIRA)
wolfgang hoschek created FLUME-2316:
---

 Summary: Upgrade MorphlineSolrSink to kite-0.11.0
 Key: FLUME-2316
 URL: https://issues.apache.org/jira/browse/FLUME-2316
 Project: Flume
  Issue Type: Improvement
  Components: Sinks+Sources
Affects Versions: v1.4.0
Reporter: wolfgang hoschek
 Fix For: v1.5.0


Release notes are here: http://kitesdk.org/docs/current/release_notes.html



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (FLUME-2315) org.apache.flume.sink.solr.morphline.BlobDeserializer is unable to handle empty streams

2014-02-08 Thread wolfgang hoschek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13895507#comment-13895507
 ] 

wolfgang hoschek commented on FLUME-2315:
-

What's the exception you are seeing in production?

The underlying BlobDeserializer returns a null event for an empty stream in 
order to signal that empty streams should be ignored. Is this a problem?

The testEmptyStream test suggested above fails because validateMiniParse() test 
helper currently isn't designed for empty streams. If you wanted to make it 
handle empty stream you'd change it to read something like this:

{code}
  private void validateMiniParse(EventDeserializer des) throws IOException {
Event evt;

des.mark();
evt = des.readEvent();
if (mini.length() != 0) {
  assertEquals(new String(evt.getBody()), mini);
} else {
  assertNull(evt);
}
des.reset(); // reset!

evt = des.readEvent();
if (mini.length() != 0) {
  assertEquals("data should be repeated, " +
  "because we reset() the stream", new String(evt.getBody()), mini);
} else {
  assertNull(evt);
}

evt = des.readEvent();
assertNull("Event should be null because there are no lines " +
"left to read", evt);

des.mark();
des.close();
  }
{code}


> org.apache.flume.sink.solr.morphline.BlobDeserializer is unable to handle 
> empty streams
> ---
>
> Key: FLUME-2315
> URL: https://issues.apache.org/jira/browse/FLUME-2315
> Project: Flume
>  Issue Type: Bug
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>Reporter: Muhammad Ehsan ul Haque
> Fix For: v1.4.0
>
>
> org.apache.flume.sink.solr.morphline.BlobDeserializer does not handles empty 
> streams correctly. For example if the deserializer is used with a spooling 
> directory containing empty files then the empty files will not be consumed 
> and through exception.
> Also following test will also fail in 
> org.apache.flume.sink.solr.morphline.TestBlobDeserializer
> {code}
>   @Test
>   public void testEmptyStream() throws IOException {
> mini = "";
> ResettableInputStream in = new ResettableTestStringInputStream(mini);
> EventDeserializer des = new BlobDeserializer(new Context(), in);
> validateMiniParse(des);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: [DISCUSS] Release Flume 1.5.0

2014-01-30 Thread Wolfgang Hoschek
+1 There a many important new features and fixes ready to go.

Wolfgang.

On Jan 30, 2014, at 7:43 PM, Chiwan Park wrote:

> +1 on new release!
> 
> --
> Regards,
> Chiwan Park
> 
> On Jan 31, 2014, at 2:17 AM, Hari Shreedharan  
> wrote:
> 
>> Hi folks,
>> 
>> It has been about 6 months since we did a release. We have added several
>> new features and fixed a lot of bugs. What do you guys think about
>> releasing Flume 1.5.0?
>> 
>> 
>> Thanks
>> Hari
> 



[jira] [Updated] (FLUME-2275) Improve scalability of MorphlineInterceptor under contention

2014-01-02 Thread wolfgang hoschek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wolfgang hoschek updated FLUME-2275:


Attachment: FLUME-2275-v4.patch

revised patch according to Hari's suggestions

> Improve scalability of MorphlineInterceptor under contention
> 
>
> Key: FLUME-2275
> URL: https://issues.apache.org/jira/browse/FLUME-2275
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>    Reporter: wolfgang hoschek
>Assignee: wolfgang hoschek
> Fix For: v1.4.1, v1.5.0
>
> Attachments: FLUME-2275-v3.patch, FLUME-2275-v4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (FLUME-2275) Improve scalability of MorphlineInterceptor under contention

2014-01-02 Thread wolfgang hoschek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13860913#comment-13860913
 ] 

wolfgang hoschek commented on FLUME-2275:
-

Sure, +1 for adding those suggestions, or do you want me send another 
corresponding patch?

> Improve scalability of MorphlineInterceptor under contention
> 
>
> Key: FLUME-2275
> URL: https://issues.apache.org/jira/browse/FLUME-2275
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>Reporter: wolfgang hoschek
>    Assignee: wolfgang hoschek
> Fix For: v1.4.1, v1.5.0
>
> Attachments: FLUME-2275-v3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (FLUME-2275) Improve scalability of MorphlineInterceptor under contention

2013-12-17 Thread wolfgang hoschek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wolfgang hoschek updated FLUME-2275:


Attachment: (was: FLUME-2275-v1.patch)

> Improve scalability of MorphlineInterceptor under contention
> 
>
> Key: FLUME-2275
> URL: https://issues.apache.org/jira/browse/FLUME-2275
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>    Reporter: wolfgang hoschek
>    Assignee: wolfgang hoschek
> Fix For: v1.4.1, v1.5.0
>
> Attachments: FLUME-2275-v3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (FLUME-2275) Improve scalability of MorphlineInterceptor under contention

2013-12-17 Thread wolfgang hoschek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wolfgang hoschek updated FLUME-2275:


Attachment: (was: FLUME-2275-v2.patch)

> Improve scalability of MorphlineInterceptor under contention
> 
>
> Key: FLUME-2275
> URL: https://issues.apache.org/jira/browse/FLUME-2275
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>    Reporter: wolfgang hoschek
>    Assignee: wolfgang hoschek
> Fix For: v1.4.1, v1.5.0
>
> Attachments: FLUME-2275-v3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (FLUME-2275) Improve scalability of MorphlineInterceptor under contention

2013-12-17 Thread wolfgang hoschek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wolfgang hoschek updated FLUME-2275:


Attachment: FLUME-2275-v3.patch

> Improve scalability of MorphlineInterceptor under contention
> 
>
> Key: FLUME-2275
> URL: https://issues.apache.org/jira/browse/FLUME-2275
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>    Reporter: wolfgang hoschek
>    Assignee: wolfgang hoschek
> Fix For: v1.4.1, v1.5.0
>
> Attachments: FLUME-2275-v3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (FLUME-2275) Improve scalability of MorphlineInterceptor under contention

2013-12-17 Thread wolfgang hoschek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wolfgang hoschek updated FLUME-2275:


Attachment: FLUME-2275-v2.patch

> Improve scalability of MorphlineInterceptor under contention
> 
>
> Key: FLUME-2275
> URL: https://issues.apache.org/jira/browse/FLUME-2275
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>    Reporter: wolfgang hoschek
>    Assignee: wolfgang hoschek
> Fix For: v1.4.1, v1.5.0
>
> Attachments: FLUME-2275-v1.patch, FLUME-2275-v2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (FLUME-2275) Improve scalability of MorphlineInterceptor under contention

2013-12-17 Thread wolfgang hoschek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wolfgang hoschek updated FLUME-2275:


Attachment: FLUME-2275-v1.patch

Patch is attached.

> Improve scalability of MorphlineInterceptor under contention
> 
>
> Key: FLUME-2275
> URL: https://issues.apache.org/jira/browse/FLUME-2275
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>    Reporter: wolfgang hoschek
>    Assignee: wolfgang hoschek
> Fix For: v1.4.1, v1.5.0
>
> Attachments: FLUME-2275-v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (FLUME-2275) Improve scalability of MorphlineInterceptor under contention

2013-12-17 Thread wolfgang hoschek (JIRA)
wolfgang hoschek created FLUME-2275:
---

 Summary: Improve scalability of MorphlineInterceptor under 
contention
 Key: FLUME-2275
 URL: https://issues.apache.org/jira/browse/FLUME-2275
 Project: Flume
  Issue Type: Improvement
  Components: Sinks+Sources
Affects Versions: v1.4.0
Reporter: wolfgang hoschek
Assignee: wolfgang hoschek
 Fix For: v1.4.1, v1.5.0






--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (FLUME-2266) Update Morphline Sink to kite-0.10.0

2013-12-10 Thread wolfgang hoschek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wolfgang hoschek updated FLUME-2266:


Attachment: FLUME-2266-v1.patch

Patch is attached

> Update Morphline Sink to kite-0.10.0
> 
>
> Key: FLUME-2266
> URL: https://issues.apache.org/jira/browse/FLUME-2266
> Project: Flume
>  Issue Type: Bug
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>    Reporter: wolfgang hoschek
> Fix For: v1.4.1, v1.5.0
>
> Attachments: FLUME-2266-v1.patch
>
>
> The CDK project was renamed to "Kite". Correspondingly, we should upgrade to 
> the Kite release. Release notes are here: 
> http://kitesdk.org/docs/current/release_notes.html



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (FLUME-2266) Update Morphline Sink to kite-0.10.0

2013-12-10 Thread wolfgang hoschek (JIRA)
wolfgang hoschek created FLUME-2266:
---

 Summary: Update Morphline Sink to kite-0.10.0
 Key: FLUME-2266
 URL: https://issues.apache.org/jira/browse/FLUME-2266
 Project: Flume
  Issue Type: Bug
  Components: Sinks+Sources
Affects Versions: v1.4.0
Reporter: wolfgang hoschek
 Fix For: v1.4.1, v1.5.0


The CDK project was renamed to "Kite". Correspondingly, we should upgrade to 
the Kite release. Release notes are here: 
http://kitesdk.org/docs/current/release_notes.html



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


Re: New Features Proposed for Apache Flume

2013-11-16 Thread Wolfgang Hoschek
FYI, I've just added a new morphline command that returns Geolocation 
information for a given IP address, using an efficient in-memory Maxmind 
database lookup - https://issues.cloudera.org/browse/CDK-227

This can then be used in the MorphlineInterceptor or Morphline Sink.

Wolfgang.

On Sep 7, 2013, at 8:47 PM, Israel Ekpo wrote:

> Thank you everyone for your very constructive feedbacks. They were very
> helpful.
> 
> To provide some background, most of these suggestions have been inspired by
> features I have found in Logstash [3].
> 
> I am going to spend more time to understand how the cdk morphline commands
> [4] work because I think it will really help with the transformation utils
> needed in FileSource.
> 
> Regarding the GrokInterceptor, I was not aware of the existence of
> MorphlineInterceptor. It already does what I was proposing with
> GrokInterceptor. So we are cool from that end.
> 
> In simple standalone tests, the commons-io class that I am planning to use
> for the FileSource handles file rotations well but I have not tested
> renames or removals yet.
> 
> Regarding the GeoIPInterceptor we can provide links for downloading the
> Maxmind database seperately without bundling the IP database with Flume
> releases.
> 
> This is how the Logstash project does it.
> 
> Because of the large number of events expected, I was planning to use
> Lucene because of the speed of executing range queries from trie indexing
> [5] and the results can also be cached in-memory if they have been
> previously executed.
> 
> I can perform some benchmarks with and without Lucene and see if the
> performance differences justify using it for the lookups.
> 
> My gut feeling is that using Lucene will lead to shorter processing times
> as the volume of events increase.
> 
> The RedisSource and RedisSink features will just be simple sources and
> sinks. The sink will push [1] events to the Redis server and the source
> will do a blocking pop [2] as it waits for new events to occur on the Redis
> Server.
> 
> I am still trying out a few things, this part is not yet finalized.
> 
> Regarding contributing features as plugins, how are plugins typically
> contributed and managed?
> 
> Do I have to create github repo and manage it independently or are they
> contributed as patches to the Flume project?
> 
> [1] http://redis.io/commands/rpush
> [2] http://redis.io/commands/blpop
> [3] http://logstash.net/docs/1.2.1/
> [4] http://cloudera.github.io/cdk/docs/0.6.0/cdk-morphlines/index.html
> [5]
> http://lucene.apache.org/core/4_4_0/core/org/apache/lucene/search/NumericRangeQuery.html
> 
> *Author and Instructor for the Upcoming Book and Lecture Series*
> *Massive Log Data Aggregation, Processing, Searching and Visualization with
> Open Source Software*
> *http://massivelogdata.com*
> 
> 
> On Wed, Aug 28, 2013 at 1:21 PM, Wolfgang Hoschek 
> wrote:
> 
>> Re: GrokInterceptor
>> 
>> This functionality is already available in the form of the Apache Flume
>> MorphlineInterceptor [1] with the grok command [2]. While grok is very
>> useful, consider that grok alone often isn't enough - you typically need
>> some other log event processing commands as well, for example as contained
>> in morphlines [3].
>> 
>> Re: FileSource
>> 
>> True file tailing would be great.
>> 
>> Merging multiple lines into one event can already be done with the
>> MorphlineInterceptor with the readMultiLine command [4]. Or maybe embed a
>> morphline directly into that new FileSource?
>> 
>> Re: GeoIPInterceptor
>> 
>> Seems to me that it would be more flexible, powerful and reusable to add
>> this kind of functionality as a morphline command - contributions welcome!
>> 
>> Finally, a word of caution, Maxmind is a good geo db, and I've used it
>> before, but it has some LGPL issues that may or may not be workable in this
>> context. Maxmind db fits into RAM - Lucene seems like overkill here - you
>> can do fast maxmind lookups directly without Lucene.
>> 
>> [1] http://flume.apache.org/FlumeUserGuide.html#morphline-interceptor
>> [2]
>> http://cloudera.github.io/cdk/docs/0.6.0/cdk-morphlines/morphlinesReferenceGuide.html#grok
>> [3] http://cloudera.github.io/cdk/docs/0.6.0/cdk-morphlines/index.html
>> [4]
>> http://cloudera.github.io/cdk/docs/0.6.0/cdk-morphlines/morphlinesReferenceGuide.html#readMultiLine
>> 
>> Wolfgang.
>> 
>>> 
>>> *FileSource*
>>> 
>>> Using the Tailer feature from Apache Commons I/O utility [1], we can tail
>>> specific files for events.
>>> 
>>> This allows u

Re: MorphlineInterceptor questions

2013-11-11 Thread Wolfgang Hoschek

On Nov 11, 2013, at 9:09 PM, Otis Gospodnetic wrote:

> Hi,
> 
> While poking around MorphlineSolrSink I got intrigued by
> MorphlineIntercepor in ...solr.morphline package.  A few Qs:
> 
> 1) This is also not Solr-specific, right?

yep

> 
> 2) I couldn't find any code in ...solr.morphline package that actually
> uses this MorphlineInterceptor... is it not used?

In Flume an Interceptor is a separate concept from a Sink. You can use the 
Interceptor without the Sink, and vice versa.

> 
> 3) I see Morphline command's "process(...)" method being called from
> both MorphlineIntercetor AND from MorphlineHandlerImpl.  How come?  My
> impression is that MorphlineHandlerImpl code is what is actually meant
> to be used, while MorphlineInterceptor doesn't seem to be used
> what am I missing? :)
> 
> 4) I found the following in the Flume Guide: "This interceptor is not
> intended for heavy duty ETL processing - if you need this consider
> moving ETL processing from the Flume Source to a Flume Sink".
> Why should one not use MorphlineInterceptor for heavy duty ETL processing?

Two reasons: 

1) Interceptors are running in the thread of the Flume Source, and are thus 
tightly coupled to the Flume Source and the I/O handler of the Flume Source. 
It's safer to not block or fail in that thread - better to hand data off of 
that thread as soon as possible into the Flume Channel (i.e a queue from which 
sinks take events - sinks run in another thread and are thus more isolated). 

2) Flume Interceptors have the limitation that they can only generate zero or 
one output events for each input event. So generating N events for an input 
event isn't possible, like one might want to do when emitting one event per 
input line, or or one event per input column, or one event per email 
attachment, etc. 

To summarize, the reasons aren't specific to morphlines, they are rooted in the 
way Flume has designed the concept of Interceptors. 

Wolfgang.

> 
> Thanks,
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/



Re: Questions about Morphline Solr Sink structure

2013-11-11 Thread Wolfgang Hoschek
Breaking backwards compat isn't an option for enterprise customers, especially 
if the only gain is making a bunch of names a little more pleasant.

Wolfgang.

On Nov 11, 2013, at 8:56 PM, Otis Gospodnetic wrote:

> Hi,
> 
> Hm, I don't get something here.  The class name is misleading/wrong,
> no?  Why not go through the usual deprecation steps to avoid breaking
> anything during the next release and then remove the
> misnamed/misplaced classes completely?
> 
> Also, I don't know enough about this code to understand fully why any
> code here would need to ship without (unit) tests...
> 
> While people could use MorphlineSolrSink even if they are not using it
> with Solr, wouldn't that be a little messy? :)
> 
> Thanks,
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
> 
> 
> On Mon, Nov 11, 2013 at 8:25 PM, Roshan Naik  wrote:
>> imho...would be nice if the code changes were done... but renaming it in
>> the user guide (without changing FQCNs) can be done regardless. and perhaps
>> more impt from a user perspective..
>> 
>> 
>> On Mon, Nov 11, 2013 at 4:04 PM, Wolfgang Hoschek 
>> wrote:
>> 
>>> Yep, the names are a bit misleading now that so much has been generalized,
>>> but whatever we do, breaking backwards compat isn't an option. Shipping a
>>> sink without tests doesn't seem compelling to me either.
>>> 
>>> Taste in names aside, as far as I can see you could use this sink for ES
>>> today without any issues.
>>> 
>>> Wolfgang.
>>> 
>>> On Nov 11, 2013, at 4:00 PM, Hari Shreedharan wrote:
>>> 
>>>> Hi Otis,
>>>> 
>>>> I don’t mind doing any of that - but the problem is that such a change
>>> could impact backward compatibility - so we’d need to keep the stubs around
>>> even though the actual functionality might be elsewhere.
>>>> 
>>>> 
>>>> Thanks,
>>>> Hari
>>>> 
>>>> 
>>>> On Monday, November 11, 2013 at 3:54 PM, Otis Gospodnetic wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> Thanks for the info, everyone.
>>>>> Yes, I noticed after my email that Blob* classes were in the process
>>>>> of being moved.
>>>>> Here is what I feel should really be done:
>>>>> 
>>>>> * get rid of solr.morphline package and move the code to
>>>>> ...morphpline package
>>>>> * get rid of any Solr-specific code (I guess just in the tests
>>>>> Wolfgang mentioned)
>>>>> * rename the sink to MorphlineSink
>>>>> 
>>>>> Thoughts?
>>>>> 
>>>>> Re loadElasticSearch() - yes, I see Wolfgang saw I opened an issue for
>>>>> that in CDK.
>>>>> 
>>>>> Thanks,
>>>>> Otis
>>>>> --
>>>>> Performance Monitoring * Log Analytics * Search Analytics
>>>>> Solr & Elasticsearch Support * http://sematext.com/
>>>>> 
>>>>> 
>>>>> On Mon, Nov 11, 2013 at 4:34 PM, Roshan Naik 
>>>>> >> ros...@hortonworks.com)> wrote:
>>>>>> We should consider rename the Morphline Solr Sink to Morphline sink in
>>> the
>>>>>> docs to avoid any possibility of misleading end users.
>>>>>> 
>>>>>> --
>>>>>> CONFIDENTIALITY NOTICE
>>>>>> NOTICE: This message is intended for the use of the individual or
>>> entity to
>>>>>> which it is addressed and may contain information that is confidential,
>>>>>> privileged and exempt from disclosure under applicable law. If the
>>> reader
>>>>>> of this message is not the intended recipient, you are hereby notified
>>> that
>>>>>> any printing, copying, dissemination, distribution, disclosure or
>>>>>> forwarding of this communication is strictly prohibited. If you have
>>>>>> received this communication in error, please contact the sender
>>> immediately
>>>>>> and delete it from your system. Thank You.
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> --
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity to
>> which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.



Re: Questions about Morphline Solr Sink structure

2013-11-11 Thread Wolfgang Hoschek
Yep, renaming it in the user guide (without changing FQCNs) seems like a really 
good idea.

Wolfgang.

On Nov 11, 2013, at 5:25 PM, Roshan Naik wrote:

> imho...would be nice if the code changes were done... but renaming it in
> the user guide (without changing FQCNs) can be done regardless. and perhaps
> more impt from a user perspective..
> 
> 
> On Mon, Nov 11, 2013 at 4:04 PM, Wolfgang Hoschek 
> wrote:
> 
>> Yep, the names are a bit misleading now that so much has been generalized,
>> but whatever we do, breaking backwards compat isn't an option. Shipping a
>> sink without tests doesn't seem compelling to me either.
>> 
>> Taste in names aside, as far as I can see you could use this sink for ES
>> today without any issues.
>> 
>> Wolfgang.
>> 
>> On Nov 11, 2013, at 4:00 PM, Hari Shreedharan wrote:
>> 
>>> Hi Otis,
>>> 
>>> I don’t mind doing any of that - but the problem is that such a change
>> could impact backward compatibility - so we’d need to keep the stubs around
>> even though the actual functionality might be elsewhere.
>>> 
>>> 
>>> Thanks,
>>> Hari
>>> 
>>> 
>>> On Monday, November 11, 2013 at 3:54 PM, Otis Gospodnetic wrote:
>>> 
>>>> Hi,
>>>> 
>>>> Thanks for the info, everyone.
>>>> Yes, I noticed after my email that Blob* classes were in the process
>>>> of being moved.
>>>> Here is what I feel should really be done:
>>>> 
>>>> * get rid of solr.morphline package and move the code to
>>>> ...morphpline package
>>>> * get rid of any Solr-specific code (I guess just in the tests
>>>> Wolfgang mentioned)
>>>> * rename the sink to MorphlineSink
>>>> 
>>>> Thoughts?
>>>> 
>>>> Re loadElasticSearch() - yes, I see Wolfgang saw I opened an issue for
>>>> that in CDK.
>>>> 
>>>> Thanks,
>>>> Otis
>>>> --
>>>> Performance Monitoring * Log Analytics * Search Analytics
>>>> Solr & Elasticsearch Support * http://sematext.com/
>>>> 
>>>> 
>>>> On Mon, Nov 11, 2013 at 4:34 PM, Roshan Naik 
>>>> > ros...@hortonworks.com)> wrote:
>>>>> We should consider rename the Morphline Solr Sink to Morphline sink in
>> the
>>>>> docs to avoid any possibility of misleading end users.
>>>>> 
>>>>> --
>>>>> CONFIDENTIALITY NOTICE
>>>>> NOTICE: This message is intended for the use of the individual or
>> entity to
>>>>> which it is addressed and may contain information that is confidential,
>>>>> privileged and exempt from disclosure under applicable law. If the
>> reader
>>>>> of this message is not the intended recipient, you are hereby notified
>> that
>>>>> any printing, copying, dissemination, distribution, disclosure or
>>>>> forwarding of this communication is strictly prohibited. If you have
>>>>> received this communication in error, please contact the sender
>> immediately
>>>>> and delete it from your system. Thank You.
>>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
> 
> -- 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to 
> which it is addressed and may contain information that is confidential, 
> privileged and exempt from disclosure under applicable law. If the reader 
> of this message is not the intended recipient, you are hereby notified that 
> any printing, copying, dissemination, distribution, disclosure or 
> forwarding of this communication is strictly prohibited. If you have 
> received this communication in error, please contact the sender immediately 
> and delete it from your system. Thank You.



Re: Questions about Morphline Solr Sink structure

2013-11-11 Thread Wolfgang Hoschek
Yep, the names are a bit misleading now that so much has been generalized, but 
whatever we do, breaking backwards compat isn't an option. Shipping a sink 
without tests doesn't seem compelling to me either. 

Taste in names aside, as far as I can see you could use this sink for ES today 
without any issues.

Wolfgang.

On Nov 11, 2013, at 4:00 PM, Hari Shreedharan wrote:

> Hi Otis,  
> 
> I don’t mind doing any of that - but the problem is that such a change could 
> impact backward compatibility - so we’d need to keep the stubs around even 
> though the actual functionality might be elsewhere.   
> 
> 
> Thanks,
> Hari
> 
> 
> On Monday, November 11, 2013 at 3:54 PM, Otis Gospodnetic wrote:
> 
>> Hi,
>> 
>> Thanks for the info, everyone.
>> Yes, I noticed after my email that Blob* classes were in the process
>> of being moved.
>> Here is what I feel should really be done:
>> 
>> * get rid of solr.morphline package and move the code to
>> ...morphpline package
>> * get rid of any Solr-specific code (I guess just in the tests
>> Wolfgang mentioned)
>> * rename the sink to MorphlineSink
>> 
>> Thoughts?
>> 
>> Re loadElasticSearch() - yes, I see Wolfgang saw I opened an issue for
>> that in CDK.
>> 
>> Thanks,
>> Otis
>> --
>> Performance Monitoring * Log Analytics * Search Analytics
>> Solr & Elasticsearch Support * http://sematext.com/
>> 
>> 
>> On Mon, Nov 11, 2013 at 4:34 PM, Roshan Naik > (mailto:ros...@hortonworks.com)> wrote:
>>> We should consider rename the Morphline Solr Sink to Morphline sink in the
>>> docs to avoid any possibility of misleading end users.
>>> 
>>> --
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity to
>>> which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>> 
>> 
>> 
>> 
> 
> 



Re: Questions about Morphline Solr Sink structure

2013-11-11 Thread Wolfgang Hoschek
Hi Otis,

You bring up a lot of very good points here, indeed. I'll try to answer as best 
as I can...

In the early days this Flume Sink started out as being very Solr specific. Over 
time I have made it more generic and reduced the dependency on Solr more and 
more, and at this point, there is in fact no dependency on Solr in the code 
left anymore (except in some tests that straddle the boundary between unit 
tests and integration tests). So in effect it wouldn't be technically wrong to 
refer to this as a Morphline Sink. The name is just a reflection of an 
evolutionary journey through history, and for retaining backwards compat.

You could easily use this sink to extract, transform and load data into ES (or 
any other app or database or storage system) without pulling in any Solr 
related jar. To do so you'd write a loadElasticSearch morphline command in a 
separate morphline maven module, and use that command instead of the loadSolr 
command in your morphline config files. The new loadElasticSearch command would 
convert a morphline record to a data structure appropriate for ES, e.g. ES 
JSON/Smile, and send that to ES. That's all there is to it, really.

A morphline record is essentially a hash table where the keys are strings and 
the values are a list of arbitrary Java objects. Those Java objects are 
typically Strings and Integers, but they can also be InputStreams or byte[] 
BLOBs, Avro objects, etc. This data model corresponds exactly to the features 
of the Lucene data model. It can also be seen as a superset of the Flume event 
data model - the Flume body is a byte[] value in the morphline _attachment_body 
field. The data model also maps well to the relational model. It also can be 
used for hierarchical data considering that the values in a morphline record 
field can be Avro, JSON, XML, protobufs, or any other custom complex data 
structure.

Wolfgang.

On Nov 10, 2013, at 4:42 PM, Otis Gospodnetic wrote:

> Hello,
> 
> One more "proactive" question.
> 
> Isn't all code under the  solr/morphline package not really about
> Morphline *Solr* Sink, but really more about *Morphline* Sink?
> In other words, if where Morphline actually outputs is dictated by the
> Morphline command in Morphline config (e.g. loadSolr()), then as far
> as Flume is concerned, isn't that really just *Morphline* Sink?
> 
> For example, if I wanted to get Flume to pass events through Morphline
> and have Morphline output to Elasticsearch, I wouldn't really want to
> add a while new Elasticsearch Morphline Sink.  I should really just be
> able to use the existing (misnamed?) Morphline Solr Sink and just
> point it to a Morphline config that has laodElasticsearch() instead of
> loadSolr().
> 
> (please ignore the fact Morphline doesn't actually have
> loadElasticsearch() yet - I think this is a Morphline issue, not a
> Flume issue)
> 
> Is the above correct?
> 
> Thanks,
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
> 
> 
> On Sun, Nov 10, 2013 at 7:29 PM, Otis Gospodnetic
>  wrote:
>> Hello,
>> 
>> Warning: I've got a Flume NG and Morphlines newbie status
>> 
>> I was looking at Morphline Solr Sink to see how one could write an
>> equivalent Morphline Elasticsearch Sink, but after looking at the
>> code, I'm a bit confused.  Here are my Qs:
>> 
>> 1)  interface MorphlineHandler mentions Solr in N places, but it
>> doesn't seem to be Solr-specific.  Couldn't one reuse this interface
>> for a Morphline ES Sink?
>> 
>> 2) In general, couldn't/shouldn't a few classes from
>> org.apache.flume.sink.solr.morphline package really not outside
>> anything solr-specific? e.g.  org.apache.flume.sink.morphline for
>> those that are Morphline-specific?
>> 
>> 3) Similarly, BlobDeserializer and BlobHandler don't seem to be even
>> Morphline-specific.  Shouldn't they be elsewhere?
>> 
>> 4) I was expecting to see SolrJ (Solr Java client library) being used
>> in MorphlineHandlerImpl or MorphlineSolrSink to send events to Solr,
>> but there is no trace of SolrJ there.  How exactly does this load
>> Flume events into Solr then?
>> Ooooh, is that because when using this sink one is supposed to provide
>> a Morphline config and this config has a hard-coded loadSolr()
>> command?
>> 
>> 5) Would it make sense to refactor any of the current Morphline Solr
>> Sink code to make it easier to add things Morphline Elasticsearch
>> Sink?  If so, any guidance you could provide would be very helpful.
>> 
>> Thanks,
>> Otis
>> --
>> Performance Monitoring * Log Analytics * Search Analytics
>> Solr & Elasticsearch Support * http://sematext.com/



Re: Morphline Solr sink dependencies on solr

2013-11-07 Thread Wolfgang Hoschek
Sounds like a misunderstanding - the Solr deps of the Morphline solr sink are 
already marked as "optional" or "test" as far as I can see. What issue are you 
running into?

Wolfgang.

On Nov 7, 2013, at 6:29 PM, Roshan Naik wrote:

> Elastic search sink marks its dependencies on  elastic search libraries as
> optional.
> However the Morphline solr sink does not mark its deps on solr as optional
> ( provided )
> 
> Is the latter intentional or something that needs to be fixed ?
> 
> -- 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to 
> which it is addressed and may contain information that is confidential, 
> privileged and exempt from disclosure under applicable law. If the reader 
> of this message is not the intended recipient, you are hereby notified that 
> any printing, copying, dissemination, distribution, disclosure or 
> forwarding of this communication is strictly prohibited. If you have 
> received this communication in error, please contact the sender immediately 
> and delete it from your system. Thank You.



[jira] [Commented] (FLUME-1988) Add Support for Additional Deserializers for SpoolingDirectorySource

2013-11-06 Thread wolfgang hoschek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815353#comment-13815353
 ] 

wolfgang hoschek commented on FLUME-1988:
-

Tip: Looks like in this case (read all lines) you could replace the 
readMultiLine command with a readClob command.

> Add Support for Additional Deserializers for SpoolingDirectorySource
> 
>
> Key: FLUME-1988
> URL: https://issues.apache.org/jira/browse/FLUME-1988
> Project: Flume
>  Issue Type: New Feature
>  Components: Docs, Sinks+Sources
>Affects Versions: v1.4.0
>Reporter: Israel Ekpo
>Assignee: Israel Ekpo
>  Labels: serializers
> Attachments: EventDeserializerType.java, 
> RegexDelimiterDeSerializer.java, ResettableTestStringInputStream.java, 
> TestRegexDelimiterDeSerializer.java
>
>
> There are certain use cases for SpoolingDirectorySource where the events in 
> the log file are not delimited with newline characters.
> Certain log files that contain stack traces, xml documents and pretty JSON 
> strings seem to contain multiple new line characters within each event.
> We can use alternative logic such as specific characters, strings or regular 
> expressions to determine when the event is complete.
> Hence I am proposing the following new deserializers based on 
> org.apache.flume.serialization.LineDeserializer
> # org.apache.flume.serialization.RegexDelimiterDeSerializer
> Allows the user to specify a regular expression that is a delimiter for 
> events within the log file
> # org.apache.flume.serialization.CharSequenceDelimiterDeSerializer
> Allows the user to specify a comma separated character sequence that is a 
> delimiter for events within the log file
> The user will specify an integer for the ascii characters and we will use 
> that as the delimter.
> For example support for \r\n could be specified as 13,10
> A list of codes is available at http://www.asciitable.com/
> We will also need to update the user guide with examples on how to configure 
> and specify a custom deserializer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (FLUME-1988) Add Support for Additional Deserializers for SpoolingDirectorySource

2013-11-06 Thread wolfgang hoschek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815355#comment-13815355
 ] 

wolfgang hoschek commented on FLUME-1988:
-

Also, let me know your thoughts on potentially writing a MorphlineDeserializer 
that implements a java.io.InputStream on top of the SpoolingDirectorySource.

> Add Support for Additional Deserializers for SpoolingDirectorySource
> 
>
> Key: FLUME-1988
> URL: https://issues.apache.org/jira/browse/FLUME-1988
> Project: Flume
>  Issue Type: New Feature
>  Components: Docs, Sinks+Sources
>Affects Versions: v1.4.0
>Reporter: Israel Ekpo
>Assignee: Israel Ekpo
>  Labels: serializers
> Attachments: EventDeserializerType.java, 
> RegexDelimiterDeSerializer.java, ResettableTestStringInputStream.java, 
> TestRegexDelimiterDeSerializer.java
>
>
> There are certain use cases for SpoolingDirectorySource where the events in 
> the log file are not delimited with newline characters.
> Certain log files that contain stack traces, xml documents and pretty JSON 
> strings seem to contain multiple new line characters within each event.
> We can use alternative logic such as specific characters, strings or regular 
> expressions to determine when the event is complete.
> Hence I am proposing the following new deserializers based on 
> org.apache.flume.serialization.LineDeserializer
> # org.apache.flume.serialization.RegexDelimiterDeSerializer
> Allows the user to specify a regular expression that is a delimiter for 
> events within the log file
> # org.apache.flume.serialization.CharSequenceDelimiterDeSerializer
> Allows the user to specify a comma separated character sequence that is a 
> delimiter for events within the log file
> The user will specify an integer for the ascii characters and we will use 
> that as the delimter.
> For example support for \r\n could be specified as 13,10
> A list of codes is available at http://www.asciitable.com/
> We will also need to update the user guide with examples on how to configure 
> and specify a custom deserializer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Re: Review Request 15107: Moving BlobHandler out of morphline sink and into HTTP source

2013-11-04 Thread Wolfgang Hoschek

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15107/#review28165
---



flume-ng-doc/sphinx/FlumeUserGuide.rst
<https://reviews.apache.org/r/15107/#comment54792>

Same weird ElasticSearch issue



flume-ng-doc/sphinx/FlumeUserGuide.rst
<https://reviews.apache.org/r/15107/#comment54791>

same weird ElasticSearch issue...



flume-ng-sinks/flume-ng-morphline-solr-sink/src/main/java/org/apache/flume/sink/solr/morphline/BlobHandler.java
<https://reviews.apache.org/r/15107/#comment54790>

realBlobHandler seems obsolete now


- Wolfgang Hoschek


On Nov. 5, 2013, 1:42 a.m., Roshan Naik wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/15107/
> ---
> 
> (Updated Nov. 5, 2013, 1:42 a.m.)
> 
> 
> Review request for Flume.
> 
> 
> Bugs: FLUME-2226
> https://issues.apache.org/jira/browse/FLUME-2226
> 
> 
> Repository: flume-git
> 
> 
> Description
> ---
> 
> - Moved BlobHandler out of morphline sink and into HTTP source along with 
> tests. 
> - Updated docs to reflect new FQCN
> - Retained dummy class for old FQCN compat
> 
> 
> Diffs
> -
> 
>   flume-ng-core/src/main/java/org/apache/flume/source/http/BlobHandler.java 
> PRE-CREATION 
>   
> flume-ng-core/src/test/java/org/apache/flume/source/http/FlumeHttpServletByteRequestWrapper.java
>  PRE-CREATION 
>   
> flume-ng-core/src/test/java/org/apache/flume/source/http/TestBlobHandler.java 
> PRE-CREATION 
>   flume-ng-doc/sphinx/FlumeUserGuide.rst 3a3038c 
>   
> flume-ng-sinks/flume-ng-morphline-solr-sink/src/main/java/org/apache/flume/sink/solr/morphline/BlobHandler.java
>  e84dec1 
>   
> flume-ng-sinks/flume-ng-morphline-solr-sink/src/test/java/org/apache/flume/sink/solr/morphline/FlumeHttpServletRequestWrapper.java
>  9711a3a 
>   
> flume-ng-sinks/flume-ng-morphline-solr-sink/src/test/java/org/apache/flume/sink/solr/morphline/TestBlobHandler.java
>  3e7de99 
> 
> Diff: https://reviews.apache.org/r/15107/diff/
> 
> 
> Testing
> ---
> 
> Ran unit tests & some manual Test.
> 
> 
> Thanks,
> 
> Roshan Naik
> 
>



Re: Review Request 15107: Moving BlobHandler out of morphline sink and into HTTP source

2013-11-04 Thread Wolfgang Hoschek


> On Nov. 4, 2013, 8:42 p.m., Wolfgang Hoschek wrote:
> > flume-ng-sinks/flume-ng-morphline-solr-sink/src/main/java/org/apache/flume/sink/solr/morphline/BlobHandler.java,
> >  line 52
> > <https://reviews.apache.org/r/15107/diff/1/?file=374518#file374518line52>
> >
> > How about using the same implementation approach as for 
> > BlobDeserializer wrt. subclassing, etc?
> 
> Roshan Naik wrote:
> yes plan to do it sometime this week.

Ah, I now see that TestBlobHandler is already moved over - Pls ignore my prior 
comment.


- Wolfgang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15107/#review28135
---


On Nov. 5, 2013, 1:42 a.m., Roshan Naik wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/15107/
> ---
> 
> (Updated Nov. 5, 2013, 1:42 a.m.)
> 
> 
> Review request for Flume.
> 
> 
> Bugs: FLUME-2226
> https://issues.apache.org/jira/browse/FLUME-2226
> 
> 
> Repository: flume-git
> 
> 
> Description
> ---
> 
> - Moved BlobHandler out of morphline sink and into HTTP source along with 
> tests. 
> - Updated docs to reflect new FQCN
> - Retained dummy class for old FQCN compat
> 
> 
> Diffs
> -
> 
>   flume-ng-core/src/main/java/org/apache/flume/source/http/BlobHandler.java 
> PRE-CREATION 
>   
> flume-ng-core/src/test/java/org/apache/flume/source/http/FlumeHttpServletByteRequestWrapper.java
>  PRE-CREATION 
>   
> flume-ng-core/src/test/java/org/apache/flume/source/http/TestBlobHandler.java 
> PRE-CREATION 
>   flume-ng-doc/sphinx/FlumeUserGuide.rst 3a3038c 
>   
> flume-ng-sinks/flume-ng-morphline-solr-sink/src/main/java/org/apache/flume/sink/solr/morphline/BlobHandler.java
>  e84dec1 
>   
> flume-ng-sinks/flume-ng-morphline-solr-sink/src/test/java/org/apache/flume/sink/solr/morphline/FlumeHttpServletRequestWrapper.java
>  9711a3a 
>   
> flume-ng-sinks/flume-ng-morphline-solr-sink/src/test/java/org/apache/flume/sink/solr/morphline/TestBlobHandler.java
>  3e7de99 
> 
> Diff: https://reviews.apache.org/r/15107/diff/
> 
> 
> Testing
> ---
> 
> Ran unit tests & some manual Test.
> 
> 
> Thanks,
> 
> Roshan Naik
> 
>



Re: Review Request 15107: Moving BlobHandler out of morphline sink and into HTTP source

2013-11-04 Thread Wolfgang Hoschek

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15107/#review28135
---



flume-ng-sinks/flume-ng-morphline-solr-sink/src/main/java/org/apache/flume/sink/solr/morphline/BlobHandler.java
<https://reviews.apache.org/r/15107/#comment54748>

How about using the same implementation approach as for BlobDeserializer 
wrt. subclassing, etc?


- Wolfgang Hoschek


On Oct. 31, 2013, 12:48 a.m., Roshan Naik wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/15107/
> ---
> 
> (Updated Oct. 31, 2013, 12:48 a.m.)
> 
> 
> Review request for Flume.
> 
> 
> Bugs: FLUME-2226
> https://issues.apache.org/jira/browse/FLUME-2226
> 
> 
> Repository: flume-git
> 
> 
> Description
> ---
> 
> - Moved BlobHandler out of morphline sink and into HTTP source along with 
> tests. 
> - Updated docs to reflect new FQCN
> - Retained dummy class for old FQCN compat
> 
> 
> Diffs
> -
> 
>   flume-ng-core/src/main/java/org/apache/flume/source/http/BlobHandler.java 
> PRE-CREATION 
>   
> flume-ng-core/src/test/java/org/apache/flume/source/http/FlumeHttpServletByteRequestWrapper.java
>  PRE-CREATION 
>   
> flume-ng-core/src/test/java/org/apache/flume/source/http/TestBlobHandler.java 
> PRE-CREATION 
>   flume-ng-doc/sphinx/FlumeUserGuide.rst e38bb67 
>   
> flume-ng-sinks/flume-ng-morphline-solr-sink/src/main/java/org/apache/flume/sink/solr/morphline/BlobHandler.java
>  e84dec1 
>   
> flume-ng-sinks/flume-ng-morphline-solr-sink/src/test/java/org/apache/flume/sink/solr/morphline/FlumeHttpServletRequestWrapper.java
>  9711a3a 
>   
> flume-ng-sinks/flume-ng-morphline-solr-sink/src/test/java/org/apache/flume/sink/solr/morphline/TestBlobHandler.java
>  3e7de99 
> 
> Diff: https://reviews.apache.org/r/15107/diff/
> 
> 
> Testing
> ---
> 
> Ran unit tests & some manual Test.
> 
> 
> Thanks,
> 
> Roshan Naik
> 
>



Re: Review Request 15110: Moving the BlobDeserializer from Morphline Sink to flume-ng-core/.../serialization

2013-11-04 Thread Wolfgang Hoschek

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15110/#review28133
---



flume-ng-doc/sphinx/FlumeUserGuide.rst
<https://reviews.apache.org/r/15110/#comment54738>

ES changes should be moved into a separate JIRA as they are unrelated.



flume-ng-doc/sphinx/FlumeUserGuide.rst
<https://reviews.apache.org/r/15110/#comment54739>

ES changes should be moved into a separate JIRA as they are unrelated.


- Wolfgang Hoschek


On Nov. 1, 2013, 12:35 a.m., Roshan Naik wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/15110/
> ---
> 
> (Updated Nov. 1, 2013, 12:35 a.m.)
> 
> 
> Review request for Flume.
> 
> 
> Bugs: FLUME-2227
> https://issues.apache.org/jira/browse/FLUME-2227
> 
> 
> Repository: flume-git
> 
> 
> Description
> ---
> 
> - Moved BlobDeserializer out of morphline sink and into 
> core/.../serialization along with tests. 
> - Updated docs to reflect new FQCN
> - Retained dummy class for old FQCN compat
> 
> 
> Diffs
> -
> 
>   
> flume-ng-core/src/main/java/org/apache/flume/serialization/BlobDeserializer.java
>  PRE-CREATION 
>   
> flume-ng-core/src/test/java/org/apache/flume/serialization/ResettableTestByteInputStream.java
>  PRE-CREATION 
>   
> flume-ng-core/src/test/java/org/apache/flume/serialization/TestBlobDeserializer.java
>  PRE-CREATION 
>   flume-ng-doc/sphinx/FlumeUserGuide.rst 3a3038c 
>   
> flume-ng-sinks/flume-ng-morphline-solr-sink/src/main/java/org/apache/flume/sink/solr/morphline/BlobDeserializer.java
>  12bdc40 
>   
> flume-ng-sinks/flume-ng-morphline-solr-sink/src/test/java/org/apache/flume/sink/solr/morphline/ResettableTestStringInputStream.java
>  e6ee9b9 
>   
> flume-ng-sinks/flume-ng-morphline-solr-sink/src/test/java/org/apache/flume/sink/solr/morphline/TestBlobDeserializer.java
>  6172c68 
> 
> Diff: https://reviews.apache.org/r/15110/diff/
> 
> 
> Testing
> ---
> 
> unit tests
> 
> 
> Thanks,
> 
> Roshan Naik
> 
>



[jira] [Commented] (FLUME-2227) Move the BlobDeserializer from Morphline Sink to flume-ng-core/.../serialization

2013-10-30 Thread wolfgang hoschek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13808808#comment-13808808
 ] 

wolfgang hoschek commented on FLUME-2227:
-

Good idea as long as existing configs using the old FQCN don't break. New code 
and doc can use the new FQCN.

> Move the BlobDeserializer from Morphline Sink to 
> flume-ng-core/.../serialization
> 
>
> Key: FLUME-2227
> URL: https://issues.apache.org/jira/browse/FLUME-2227
> Project: Flume
>  Issue Type: Bug
>Affects Versions: v1.4.0
>Reporter: Roshan Naik
>
> This deserializer is more applicable to SpoolDir source and has no 
> dependencies on Morphline sink.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Re: Morphline Solr Sink Unit Test Error

2013-10-23 Thread Wolfgang Hoschek
Looks like Guava's ClassPath helper wasn't tested on Windows, after all (it 
works on Unix, of course).

This ClassPath class is manually shaded from guava-14.0.1 in order to not 
require (yet to allow) users to run guava > 11.0.2 because guava-11.0.2 is what 
a lot of Hadoop components bundle.

You could try replacing that manually shaded class with a version from guava-15 
and see it that helps.

Before doing so I'd recommend checking if ClassPath.from() from vanilla 
guava-15 blows up on windows path conventions.

Let me know your finding,
Wolfgang.

On Oct 23, 2013, at 7:39 PM, Roshan Naik wrote:

> Guys,
>  I am seeing the following stack traces on Windows. Any clue ?
> 
> 
> 
> 1) for TestMorphlineInterceptor.testNoOperation:
> 
> Error Message
> 
> URI has an authority component
> 
> Stacktrace
> 
> java.lang.IllegalArgumentException: URI has an authority component
>   at java.io.File.(File.java:368)
>   at 
> com.cloudera.cdk.morphline.shaded.com.google.common.reflect.ClassPath$Scanner.scan(ClassPath.java:276)
>   at 
> com.cloudera.cdk.morphline.shaded.com.google.common.reflect.ClassPath$Scanner.scanJar(ClassPath.java:339)
>   at 
> com.cloudera.cdk.morphline.shaded.com.google.common.reflect.ClassPath$Scanner.scanFrom(ClassPath.java:288)
>   at 
> com.cloudera.cdk.morphline.shaded.com.google.common.reflect.ClassPath$Scanner.scan(ClassPath.java:276)
>   at 
> com.cloudera.cdk.morphline.shaded.com.google.common.reflect.ClassPath.from(ClassPath.java:84)
>   at 
> com.cloudera.cdk.morphline.api.MorphlineContext.getTopLevelClasses(MorphlineContext.java:100)
>   at 
> com.cloudera.cdk.morphline.api.MorphlineContext.importCommandBuilders(MorphlineContext.java:68)
>   at com.cloudera.cdk.morphline.stdlib.Pipe.(Pipe.java:41)
>   at 
> com.cloudera.cdk.morphline.stdlib.PipeBuilder.build(PipeBuilder.java:39)
>   at com.cloudera.cdk.morphline.base.Compiler.compile(Compiler.java:127)
>   at com.cloudera.cdk.morphline.base.Compiler.compile(Compiler.java:56)
>   at 
> org.apache.flume.sink.solr.morphline.MorphlineHandlerImpl.configure(MorphlineHandlerImpl.java:90)
>   at 
> org.apache.flume.sink.solr.morphline.MorphlineInterceptor$LocalMorphlineInterceptor.(MorphlineInterceptor.java:140)
>   at 
> org.apache.flume.sink.solr.morphline.MorphlineInterceptor.(MorphlineInterceptor.java:55)
>   at 
> org.apache.flume.sink.solr.morphline.MorphlineInterceptor$Builder.build(MorphlineInterceptor.java:117)
>   at 
> org.apache.flume.sink.solr.morphline.TestMorphlineInterceptor.build(TestMorphlineInterceptor.java:141)
>   at 
> org.apache.flume.sink.solr.morphline.TestMorphlineInterceptor.testNoOperation(TestMorphlineInterceptor.java:46)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
>   at 
> org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
>

[jira] [Updated] (FLUME-2213) MorphlineInterceptor should share metric registry across threads for better (aggregate) reporting

2013-10-13 Thread wolfgang hoschek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wolfgang hoschek updated FLUME-2213:


Attachment: FLUME-2213-v3.patch

add some more metrics

> MorphlineInterceptor should share metric registry across threads for better 
> (aggregate) reporting
> -
>
> Key: FLUME-2213
> URL: https://issues.apache.org/jira/browse/FLUME-2213
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>    Reporter: wolfgang hoschek
>    Assignee: wolfgang hoschek
> Fix For: v1.4.1, v1.5.0
>
> Attachments: FLUME-2213-v3.patch
>
>
> Otherwise the each thread will have it's own registry which isn't great for 
> getting a sense of the overall metrics, i.e. aggregated across all threads.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (FLUME-2213) MorphlineInterceptor should share metric registry across threads for better (aggregate) reporting

2013-10-13 Thread wolfgang hoschek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wolfgang hoschek updated FLUME-2213:


Attachment: (was: FLUME-2213-v2.patch)

> MorphlineInterceptor should share metric registry across threads for better 
> (aggregate) reporting
> -
>
> Key: FLUME-2213
> URL: https://issues.apache.org/jira/browse/FLUME-2213
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>    Reporter: wolfgang hoschek
>    Assignee: wolfgang hoschek
> Fix For: v1.4.1, v1.5.0
>
> Attachments: FLUME-2213-v3.patch
>
>
> Otherwise the each thread will have it's own registry which isn't great for 
> getting a sense of the overall metrics, i.e. aggregated across all threads.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (FLUME-2213) MorphlineInterceptor should share metric registry across threads for better (aggregate) reporting

2013-10-12 Thread wolfgang hoschek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wolfgang hoschek updated FLUME-2213:


Attachment: (was: FLUME-2213-v1.patch)

> MorphlineInterceptor should share metric registry across threads for better 
> (aggregate) reporting
> -
>
> Key: FLUME-2213
> URL: https://issues.apache.org/jira/browse/FLUME-2213
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>    Reporter: wolfgang hoschek
>    Assignee: wolfgang hoschek
> Fix For: v1.4.1, v1.5.0
>
> Attachments: FLUME-2213-v2.patch
>
>
> Otherwise the each thread will have it's own registry which isn't great for 
> getting a sense of the overall metrics, i.e. aggregated across all threads.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (FLUME-2213) MorphlineInterceptor should share metric registry across threads for better (aggregate) reporting

2013-10-12 Thread wolfgang hoschek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wolfgang hoschek updated FLUME-2213:


Attachment: FLUME-2213-v2.patch

better patch, now also shared registry among sinks that use the same 
morphlineFile and morphlineId

> MorphlineInterceptor should share metric registry across threads for better 
> (aggregate) reporting
> -
>
> Key: FLUME-2213
> URL: https://issues.apache.org/jira/browse/FLUME-2213
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>    Reporter: wolfgang hoschek
>    Assignee: wolfgang hoschek
> Fix For: v1.4.1, v1.5.0
>
> Attachments: FLUME-2213-v2.patch
>
>
> Otherwise the each thread will have it's own registry which isn't great for 
> getting a sense of the overall metrics, i.e. aggregated across all threads.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (FLUME-2213) MorphlineInterceptor should share metric registry across threads for better (aggregate) reporting

2013-10-12 Thread wolfgang hoschek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wolfgang hoschek updated FLUME-2213:


Affects Version/s: v1.4.0
Fix Version/s: (was: v1.4.0)
   v1.5.0
   v1.4.1

> MorphlineInterceptor should share metric registry across threads for better 
> (aggregate) reporting
> -
>
> Key: FLUME-2213
> URL: https://issues.apache.org/jira/browse/FLUME-2213
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>    Reporter: wolfgang hoschek
>    Assignee: wolfgang hoschek
> Fix For: v1.4.1, v1.5.0
>
> Attachments: FLUME-2213-v1.patch
>
>
> Otherwise the each thread will have it's own registry which isn't great for 
> getting a sense of the overall metrics, i.e. aggregated across all threads.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (FLUME-2213) MorphlineInterceptor should share metric registry across threads for better (aggregate) reporting

2013-10-12 Thread wolfgang hoschek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wolfgang hoschek updated FLUME-2213:


Attachment: FLUME-2213-v1.patch

> MorphlineInterceptor should share metric registry across threads for better 
> (aggregate) reporting
> -
>
> Key: FLUME-2213
> URL: https://issues.apache.org/jira/browse/FLUME-2213
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>    Reporter: wolfgang hoschek
>    Assignee: wolfgang hoschek
> Fix For: v1.4.1, v1.5.0
>
> Attachments: FLUME-2213-v1.patch
>
>
> Otherwise the each thread will have it's own registry which isn't great for 
> getting a sense of the overall metrics, i.e. aggregated across all threads.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (FLUME-2213) MorphlineInterceptor should share metric registry across threads for better (aggregate) reporting

2013-10-12 Thread wolfgang hoschek (JIRA)
wolfgang hoschek created FLUME-2213:
---

 Summary: MorphlineInterceptor should share metric registry across 
threads for better (aggregate) reporting
 Key: FLUME-2213
 URL: https://issues.apache.org/jira/browse/FLUME-2213
 Project: Flume
  Issue Type: Improvement
  Components: Sinks+Sources
Reporter: wolfgang hoschek
Assignee: wolfgang hoschek
 Fix For: v1.4.0


Otherwise the each thread will have it's own registry which isn't great for 
getting a sense of the overall metrics, i.e. aggegated across all threads.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (FLUME-2213) MorphlineInterceptor should share metric registry across threads for better (aggregate) reporting

2013-10-12 Thread wolfgang hoschek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wolfgang hoschek updated FLUME-2213:


Description: Otherwise the each thread will have it's own registry which 
isn't great for getting a sense of the overall metrics, i.e. aggregated across 
all threads.  (was: Otherwise the each thread will have it's own registry which 
isn't great for getting a sense of the overall metrics, i.e. aggegated across 
all threads.)

> MorphlineInterceptor should share metric registry across threads for better 
> (aggregate) reporting
> -
>
> Key: FLUME-2213
> URL: https://issues.apache.org/jira/browse/FLUME-2213
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Reporter: wolfgang hoschek
>Assignee: wolfgang hoschek
> Fix For: v1.4.0
>
>
> Otherwise the each thread will have it's own registry which isn't great for 
> getting a sense of the overall metrics, i.e. aggregated across all threads.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (FLUME-2212) upgrade to Morphlines-0.8.0

2013-10-11 Thread wolfgang hoschek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wolfgang hoschek updated FLUME-2212:


Attachment: FLUME-2212-v1.patch

> upgrade to Morphlines-0.8.0
> ---
>
> Key: FLUME-2212
> URL: https://issues.apache.org/jira/browse/FLUME-2212
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>    Reporter: wolfgang hoschek
> Fix For: v1.4.1, v1.5.0
>
> Attachments: FLUME-2212-v1.patch
>
>
> Upgrade to morphlines-0.8.0. Release notes are here: 
> http://cloudera.github.io/cdk/docs/current/release_notes.html



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (FLUME-2212) upgrade to Morphlines-0.8.0

2013-10-11 Thread wolfgang hoschek (JIRA)
wolfgang hoschek created FLUME-2212:
---

 Summary: upgrade to Morphlines-0.8.0
 Key: FLUME-2212
 URL: https://issues.apache.org/jira/browse/FLUME-2212
 Project: Flume
  Issue Type: Improvement
  Components: Sinks+Sources
Affects Versions: v1.4.0
Reporter: wolfgang hoschek
 Fix For: v1.4.1, v1.5.0


Upgrade to morphlines-0.8.0. Release notes are here: 
http://cloudera.github.io/cdk/docs/current/release_notes.html



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (FLUME-1988) Add Support for Additional Deserializers for SpoolingDirectorySource

2013-09-27 Thread wolfgang hoschek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13780509#comment-13780509
 ] 

wolfgang hoschek commented on FLUME-1988:
-

Splitting an input stream into events in a configurable and extensible way 
sounds like a good idea. 

An alternative way would be to address this problem (and many similar problems) 
by writing a MorphlineDeserializer that implements a java.io.InputStream on top 
of the SpoolingDirectorySource, then have that MorphlineDeserializer feed that 
InputStream into a configurable morphline which in turn contains a 
readMultiLine command. Then you can easily replace the readMultiLine with a 
command that splits on a character sequence, etc, etc. There are many other 
flavours of the same byte stream -> event splitting theme, and this way 
individual commands can be composed together in a morphline which makes them 
more powerful, flexible and reusable. 

http://cloudera.github.io/cdk/docs/current/cdk-morphlines/morphlinesReferenceGuide.html#readMultiLine


> Add Support for Additional Deserializers for SpoolingDirectorySource
> 
>
> Key: FLUME-1988
> URL: https://issues.apache.org/jira/browse/FLUME-1988
> Project: Flume
>  Issue Type: New Feature
>  Components: Docs, Sinks+Sources
>Affects Versions: v1.4.0
>Reporter: Israel Ekpo
>Assignee: Israel Ekpo
>  Labels: serializers
> Attachments: EventDeserializerType.java, 
> RegexDelimiterDeSerializer.java, ResettableTestStringInputStream.java, 
> TestRegexDelimiterDeSerializer.java
>
>
> There are certain use cases for SpoolingDirectorySource where the events in 
> the log file are not delimited with newline characters.
> Certain log files that contain stack traces, xml documents and pretty JSON 
> strings seem to contain multiple new line characters within each event.
> We can use alternative logic such as specific characters, strings or regular 
> expressions to determine when the event is complete.
> Hence I am proposing the following new deserializers based on 
> org.apache.flume.serialization.LineDeserializer
> # org.apache.flume.serialization.RegexDelimiterDeSerializer
> Allows the user to specify a regular expression that is a delimiter for 
> events within the log file
> # org.apache.flume.serialization.CharSequenceDelimiterDeSerializer
> Allows the user to specify a comma separated character sequence that is a 
> delimiter for events within the log file
> The user will specify an integer for the ascii characters and we will use 
> that as the delimter.
> For example support for \r\n could be specified as 13,10
> A list of codes is available at http://www.asciitable.com/
> We will also need to update the user guide with examples on how to configure 
> and specify a custom deserializer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [ANNOUNCE] New Flume Committer - Wolfgang Hoschek

2013-09-25 Thread Wolfgang Hoschek
Thanks everybody! Looking forward to a good ride.

Wolfgang.

On Sep 24, 2013, at 3:39 PM, Hari Shreedharan wrote:

> On behalf of the Apache Flume PMC, I am excited to welcome Wolfgang Hoschek 
> as a committer on the Apache Flume project. Wolfgang contributed a new sink 
> with the ability to do heavyweight ETL-style processing and writing to Apache 
> Solr indices.
> 
> Congratulations and Welcome, Wolfgang!
> 
> 
> Cheers,
> Hari Shreedharan



[jira] [Updated] (FLUME-2185) Upgrade morphlines to 0.7.0

2013-09-11 Thread wolfgang hoschek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wolfgang hoschek updated FLUME-2185:


Fix Version/s: v1.4.1

> Upgrade morphlines to 0.7.0
> ---
>
> Key: FLUME-2185
> URL: https://issues.apache.org/jira/browse/FLUME-2185
> Project: Flume
>  Issue Type: New Feature
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>    Reporter: wolfgang hoschek
> Fix For: v1.4.1, v1.5.0
>
> Attachments: FLUME-2185-v1.patch, FLUME-2185-v2.patch
>
>
> Now that morphlines-0.7.0 has been released we should upgrade to that. 
> Release notes are here: 
> http://cloudera.github.io/cdk/docs/current/release_notes.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (FLUME-2185) Upgrade morphlines to 0.7.0

2013-09-08 Thread wolfgang hoschek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wolfgang hoschek updated FLUME-2185:


Attachment: FLUME-2185-v2.patch

> Upgrade morphlines to 0.7.0
> ---
>
> Key: FLUME-2185
> URL: https://issues.apache.org/jira/browse/FLUME-2185
> Project: Flume
>  Issue Type: New Feature
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>    Reporter: wolfgang hoschek
> Fix For: v1.5.0
>
> Attachments: FLUME-2185-v1.patch, FLUME-2185-v2.patch
>
>
> Now that morphlines-0.7.0 has been released we should upgrade to that. 
> Release notes are here: 
> http://cloudera.github.io/cdk/docs/current/release_notes.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (FLUME-2185) Upgrade morphlines to 0.7.0

2013-09-08 Thread wolfgang hoschek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761535#comment-13761535
 ] 

wolfgang hoschek commented on FLUME-2185:
-

FYI, the patch assumes that FLUME-2184 has already been applied.

> Upgrade morphlines to 0.7.0
> ---
>
> Key: FLUME-2185
> URL: https://issues.apache.org/jira/browse/FLUME-2185
> Project: Flume
>  Issue Type: New Feature
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>Reporter: wolfgang hoschek
> Fix For: v1.5.0
>
> Attachments: FLUME-2185-v1.patch, FLUME-2185-v2.patch
>
>
> Now that morphlines-0.7.0 has been released we should upgrade to that. 
> Release notes are here: 
> http://cloudera.github.io/cdk/docs/current/release_notes.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (FLUME-2185) Upgrade morphlines to 0.7.0

2013-09-08 Thread wolfgang hoschek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wolfgang hoschek updated FLUME-2185:


Attachment: FLUME-2185-v1.patch

> Upgrade morphlines to 0.7.0
> ---
>
> Key: FLUME-2185
> URL: https://issues.apache.org/jira/browse/FLUME-2185
> Project: Flume
>  Issue Type: New Feature
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>    Reporter: wolfgang hoschek
> Fix For: v1.5.0
>
> Attachments: FLUME-2185-v1.patch
>
>
> Now that morphlines-0.7.0 has been released we should upgrade to that. 
> Release notes are here: 
> http://cloudera.github.io/cdk/docs/current/release_notes.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (FLUME-2185) Upgrade morphlines to 0.7.0

2013-09-08 Thread wolfgang hoschek (JIRA)
wolfgang hoschek created FLUME-2185:
---

 Summary: Upgrade morphlines to 0.7.0
 Key: FLUME-2185
 URL: https://issues.apache.org/jira/browse/FLUME-2185
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Affects Versions: v1.4.0
Reporter: wolfgang hoschek
 Fix For: v1.5.0


Now that morphlines-0.7.0 has been released we should upgrade to that. Release 
notes are here: http://cloudera.github.io/cdk/docs/current/release_notes.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (FLUME-2184) flume-ng-morphline-solr-sink Build failing due to incorrect hadoop-common dependency declaration

2013-09-08 Thread wolfgang hoschek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761527#comment-13761527
 ] 

wolfgang hoschek commented on FLUME-2184:
-

Ok, thanks!
+1 to the patch

> flume-ng-morphline-solr-sink Build failing due to incorrect hadoop-common 
> dependency declaration
> 
>
> Key: FLUME-2184
> URL: https://issues.apache.org/jira/browse/FLUME-2184
> Project: Flume
>  Issue Type: Bug
>  Components: Sinks+Sources
>Affects Versions: v1.5.0
>Reporter: Jagat Singh
>Priority: Minor
> Attachments: FLUME-2184-0.patch
>
>
> flume-ng-morphline-solr-sink build fails due to incorrect dependency 
> declaration.
> Downloaded the code from flume git repo.
> Trying to build it
> mvn clean install -DskipTests
> It gives me this error.
> The project expects 1.0.1 version of jar which was never there at maven 
> central.
> http://search.maven.org/#search|gav|1|g%3A%22org.apache.hadoop%22%20AND%20a%3A%22hadoop-common%22
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-remote-resources-plugin:1.1:process (default) 
> on project flume-ng-morphline-solr-sink: Failed to resolve dependencies for 
> one or more projects in the reactor. Reason: Missing:
> [ERROR] --
> [ERROR] 1) org.apache.hadoop:hadoop-common:jar:1.0.1
> [ERROR]
> [ERROR] Try downloading the file manually from the project website.
> [ERROR] Path to dependency:
> [ERROR] 1) 
> org.apache.flume.flume-ng-sinks:flume-ng-morphline-solr-sink:jar:1.5.0-SNAPSHOT
> [ERROR] 2) com.cloudera.cdk:cdk-morphlines-all:pom:0.6.0
> [ERROR] 3) com.cloudera.cdk:cdk-morphlines-solr-core:jar:0.6.0
> [ERROR] 4) org.apache.solr:solr-core:jar:4.4.0
> [ERROR] 5) org.apache.hadoop:hadoop-common:jar:1.0.1
> Details here 
> http://mail-archives.apache.org/mod_mbox/flume-dev/201309.mbox/%3CCAJ-d8Xep4LcoSE0Yo%3D1w17CewQFzDU%2B5KQDa3DGZDT-oQ3XHYg%40mail.gmail.com%3E
> Attaching patch to fix it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (FLUME-2184) flume-ng-morphline-solr-sink Build failing due to incorrect hadoop-common dependency declaration

2013-09-07 Thread wolfgang hoschek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761204#comment-13761204
 ] 

wolfgang hoschek commented on FLUME-2184:
-

Looks like the underlying problem is in the top level flume/pom.xml, which 
currently reads:

{code}

  hadoop-1.0
...
1.0.1
...


...


  hadoop-2
...
2.0.0-alpha
...


...
  
org.apache.hadoop
hadoop-client
${hadoop.version}
  

{code}

Also note that the following works fine without your patch: mvn 
-Dhadoop.profile=2 clean test -DskipTests


> flume-ng-morphline-solr-sink Build failing due to incorrect hadoop-common 
> dependency declaration
> 
>
> Key: FLUME-2184
> URL: https://issues.apache.org/jira/browse/FLUME-2184
> Project: Flume
>  Issue Type: Bug
>  Components: Sinks+Sources
>Affects Versions: v1.5.0
>Reporter: Jagat Singh
>Priority: Minor
> Attachments: FLUME-2184-0.patch
>
>
> flume-ng-morphline-solr-sink build fails due to incorrect dependency 
> declaration.
> Downloaded the code from flume git repo.
> Trying to build it
> mvn clean install -DskipTests
> It gives me this error.
> The project expects 1.0.1 version of jar which was never there at maven 
> central.
> http://search.maven.org/#search|gav|1|g%3A%22org.apache.hadoop%22%20AND%20a%3A%22hadoop-common%22
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-remote-resources-plugin:1.1:process (default) 
> on project flume-ng-morphline-solr-sink: Failed to resolve dependencies for 
> one or more projects in the reactor. Reason: Missing:
> [ERROR] --
> [ERROR] 1) org.apache.hadoop:hadoop-common:jar:1.0.1
> [ERROR]
> [ERROR] Try downloading the file manually from the project website.
> [ERROR] Path to dependency:
> [ERROR] 1) 
> org.apache.flume.flume-ng-sinks:flume-ng-morphline-solr-sink:jar:1.5.0-SNAPSHOT
> [ERROR] 2) com.cloudera.cdk:cdk-morphlines-all:pom:0.6.0
> [ERROR] 3) com.cloudera.cdk:cdk-morphlines-solr-core:jar:0.6.0
> [ERROR] 4) org.apache.solr:solr-core:jar:4.4.0
> [ERROR] 5) org.apache.hadoop:hadoop-common:jar:1.0.1
> Details here 
> http://mail-archives.apache.org/mod_mbox/flume-dev/201309.mbox/%3CCAJ-d8Xep4LcoSE0Yo%3D1w17CewQFzDU%2B5KQDa3DGZDT-oQ3XHYg%40mail.gmail.com%3E
> Attaching patch to fix it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: New Features Proposed for Apache Flume

2013-08-28 Thread Wolfgang Hoschek
Re: GrokInterceptor

This functionality is already available in the form of the Apache Flume 
MorphlineInterceptor [1] with the grok command [2]. While grok is very useful, 
consider that grok alone often isn't enough - you typically need some other log 
event processing commands as well, for example as contained in morphlines [3].

Re: FileSource

True file tailing would be great. 

Merging multiple lines into one event can already be done with the 
MorphlineInterceptor with the readMultiLine command [4]. Or maybe embed a 
morphline directly into that new FileSource?

Re: GeoIPInterceptor

Seems to me that it would be more flexible, powerful and reusable to add this 
kind of functionality as a morphline command - contributions welcome!

Finally, a word of caution, Maxmind is a good geo db, and I've used it before, 
but it has some LGPL issues that may or may not be workable in this context. 
Maxmind db fits into RAM - Lucene seems like overkill here - you can do fast 
maxmind lookups directly without Lucene.

[1] http://flume.apache.org/FlumeUserGuide.html#morphline-interceptor
[2] 
http://cloudera.github.io/cdk/docs/0.6.0/cdk-morphlines/morphlinesReferenceGuide.html#grok
[3] http://cloudera.github.io/cdk/docs/0.6.0/cdk-morphlines/index.html
[4] 
http://cloudera.github.io/cdk/docs/0.6.0/cdk-morphlines/morphlinesReferenceGuide.html#readMultiLine

Wolfgang.

> 
> *FileSource*
> 
> Using the Tailer feature from Apache Commons I/O utility [1], we can tail
> specific files for events.
> 
> This allows us to, regardless of the operating system, have the ability to
> watch files for future events as they occur.
> 
> It also allows us to step in and determine if two or more events should be
> merged into one events if newline characters are present in an event.
> 
> We can configure certain regular expressions that determines if a specific
> line is a new event or part of the prevent event.
> 
> Essentially, this source will have the ability to merge multiple lines into
> one event before it is passed on to interceptors.
> 
> It has been complicated group multiple lines into a single event with the
> Spooling Directory Source or Exec Source. I tried creating custom
> deserializers but it was hard to get around the logic used to parse the
> files.
> 
> Using the Spooling Directory also means we cannot watch the original files
> so we need a background process to copy over the log files into the
> spooling directory which requires additional setup.
> 
> The tail command is not also available on all operating systems out of the
> box.
> 
> 
> *GrokInterceptor*
> 
> With this interceptor we can parse semi-structure and unstructured text and
> log data in the headers and body of the event into something structured
> that can be easily queried.
> I plan to use the information [2] and [3] for this.
> With this interceptor, we can extract HTTP response codes, response times,
> user agents, IP addresses and a whole bunch of useful data point from free
> form text.
> 
> 
> 
> *GeoIPInterceptor*
> 
> This is for IP intelligence.
> 
> This interceptor will allow us to use the value of an IP address in the
> event header or body of the request to estimate the geographical location
> of the IP address.
> 
> Using the database available here [4], we can inject the two-letter code or
> country name of the IP address into the event.
> 
> We can also deduce other values such as city name, postalCode, latitude,
> longitude, Internet Service Provider and Organization name.
> 
> This can be very helpful in analyzing traffic patterns and target audience
> from webserver or application logs.
> 
> The database is loaded into a Lucene index when the agent is started up.
> The index is only created once if it does not already exists.
> 
> As the interceptor comes across events, it maps the IP address to a variety
> of values that can be injected into the events.
> 
> 
> 
> *RedisSink*
> 
> This can provide another option for setting up a fan-in and/or fan-out
> architecture.
> 
> The RedisSink can serve as a queue that is used as a source by another
> agent down the line.
> 
> *References*
> [1]
> http://commons.apache.org/proper/commons-io/javadocs/api-release/org/apache/commons/io/input/Tailer.html
> [2] https://github.com/NFLabs/java-grok
> [3] http://www.anthonycorbacho.net/portfolio/grok-pattern/
> [4] http://dev.maxmind.com/geoip/legacy/geolite/#Downloads
> [5] http://dev.maxmind.com/geoip/legacy/csv/
> [6] http://redis.io/documentation
> [7] https://github.com/xetorthio/jedis
> 
> *Author and Instructor for the Upcoming Book and Lecture Series*
> *Massive Log Data Aggregation, Processing, Searching and Visualization with
> Open Source Software*
> *http://massivelogdata.com*



[jira] [Comment Edited] (FLUME-2174) Integration of morphline solr sink puts Lucene/Solr dependencies on the default classpath

2013-08-21 Thread wolfgang hoschek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13747113#comment-13747113
 ] 

wolfgang hoschek edited comment on FLUME-2174 at 8/22/13 1:14 AM:
--

The exclusions are there for a reason. The excluded jars should not get pulled 
in during the tests. We verify that the tests pass with those exclusions.

  was (Author: whoschek):
The exclusions should not get pulled in during the tests either.
  
> Integration of morphline solr sink puts Lucene/Solr dependencies on the 
> default classpath
> -
>
> Key: FLUME-2174
> URL: https://issues.apache.org/jira/browse/FLUME-2174
> Project: Flume
>  Issue Type: Bug
>  Components: Build, Docs
>Affects Versions: v1.4.0
>Reporter: Roman Shaposhnik
>Assignee: Roman Shaposhnik
> Fix For: v1.4.1
>
> Attachments: 
> 0001-FLUME-2174.-Integration-of-morphline-solr-sink-puts-.patch
>
>
> The integration of morphline solr sink pulls Apache Lucene/Solr dependencies 
> on the default classpath and ships them in the default binary distribution. 
> This has an unfortunate side effect of breaking the deployments wishing to 
> utilize the sink against the version of Lucene/Solr other than the ones 
> shipped and also affects Elastic Search Sink.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (FLUME-2174) Integration of morphline solr sink puts Lucene/Solr dependencies on the default classpath

2013-08-21 Thread wolfgang hoschek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13747113#comment-13747113
 ] 

wolfgang hoschek commented on FLUME-2174:
-

The exclusions should not get pulled in during the tests either.

> Integration of morphline solr sink puts Lucene/Solr dependencies on the 
> default classpath
> -
>
> Key: FLUME-2174
> URL: https://issues.apache.org/jira/browse/FLUME-2174
> Project: Flume
>  Issue Type: Bug
>  Components: Build, Docs
>Affects Versions: v1.4.0
>Reporter: Roman Shaposhnik
>Assignee: Roman Shaposhnik
> Fix For: v1.4.1
>
> Attachments: 
> 0001-FLUME-2174.-Integration-of-morphline-solr-sink-puts-.patch
>
>
> The integration of morphline solr sink pulls Apache Lucene/Solr dependencies 
> on the default classpath and ships them in the default binary distribution. 
> This has an unfortunate side effect of breaking the deployments wishing to 
> utilize the sink against the version of Lucene/Solr other than the ones 
> shipped and also affects Elastic Search Sink.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (FLUME-2174) Integration of morphline solr sink puts Lucene/Solr dependencies on the default classpath

2013-08-21 Thread wolfgang hoschek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13747105#comment-13747105
 ] 

wolfgang hoschek commented on FLUME-2174:
-

I think the patch shouldn't remove the exclusions for 
geronimo-stax-api_1.0_spec and xercesImpl.

> Integration of morphline solr sink puts Lucene/Solr dependencies on the 
> default classpath
> -
>
> Key: FLUME-2174
> URL: https://issues.apache.org/jira/browse/FLUME-2174
> Project: Flume
>  Issue Type: Bug
>  Components: Build, Docs
>Affects Versions: v1.4.0
>Reporter: Roman Shaposhnik
>Assignee: Roman Shaposhnik
> Fix For: v1.4.1
>
> Attachments: 
> 0001-FLUME-2174.-Integration-of-morphline-solr-sink-puts-.patch
>
>
> The integration of morphline solr sink pulls Apache Lucene/Solr dependencies 
> on the default classpath and ships them in the default binary distribution. 
> This has an unfortunate side effect of breaking the deployments wishing to 
> utilize the sink against the version of Lucene/Solr other than the ones 
> shipped and also affects Elastic Search Sink.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (FLUME-1687) ApacheSolrSink

2013-07-16 Thread wolfgang hoschek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13710118#comment-13710118
 ] 

wolfgang hoschek commented on FLUME-1687:
-

My understanding of FLUME-1687 is that it simply forwards the flume headers 
as-is to Solr, i.e. it essentially expects an upstream component to send flume 
events that conform and are formatted exactly as required by Solr. I think it 
also doesn't support SolrCloud.

In contrast, Morphline Solr Sink is well suited for use cases that stream raw 
data into HDFS (via the HdfsSink) and simultaneously extract, transform and 
load the same data into Solr. In particular, the Morphline Solr Sink can 
process arbitrary heterogeneous raw data from disparate data sources and turn 
it into a data model that is useful to Search applications. The ETL 
functionality is customizable using a morphline configuration file that defines 
a chain of pluggable transformation commands that pipe event records from one 
command to another. The Morphline Solr Sink also supports SolrCloud and 
transactional batching and Solr for more scalability, and Solr collection 
aliases (e.g. for transparent expiry of old index partitions).

Morphline Solr Sink can do everything that FLUME-1687 can do, and more.

Would be nice to merge those two efforts into one.



> ApacheSolrSink
> --
>
> Key: FLUME-1687
> URL: https://issues.apache.org/jira/browse/FLUME-1687
> Project: Flume
>  Issue Type: New Feature
>  Components: Sinks+Sources
>Affects Versions: v1.2.0, v1.4.0
>Reporter: wolfgang hoschek
>Assignee: Israel Ekpo
> Attachments: flume-new-feature-dependencies.zip, 
> flume-new-features-1.3.1.jar, flume-new-features-1.3.1-sources.jar
>
>
> Some use cases need near real time full text indexing of data through Flume 
> into Solr, where a Flume sink can write directly to a Solr search server. 
> This is a scalable way to provide low latency querying and data acquisition. 
> It complements (rather than replaces) use cases based on Map Reduce batch 
> analysis of HDFS data.
> Apache Solr has a client API that uses REST to add documents to a Solr 
> server, which in turn is based on Lucene. A Solr Sink can extract documents 
> from flume events and forward them to Solr.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (FLUME-2124) Upgrade Morphline Solr Sink to CDK 0.4.1

2013-07-10 Thread wolfgang hoschek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wolfgang hoschek updated FLUME-2124:


Description: Now that CDK 0.4.1 has been released it would be good to 
upgrade the Morphline Solr Sink to that. Release notes are here: 
http://cloudera.github.io/cdk/docs/0.4.1/release_notes.html  (was: Now that CDK 
1.4.1 has been released it would be good to upgrade the Morphline Solr Sink to 
that. Release notes are here: 
http://cloudera.github.io/cdk/docs/0.4.1/release_notes.html)
Summary: Upgrade Morphline Solr Sink to CDK 0.4.1  (was: Upgrade 
Morphline Solr Sink to CDK 1.4.1)

> Upgrade Morphline Solr Sink to CDK 0.4.1
> 
>
> Key: FLUME-2124
> URL: https://issues.apache.org/jira/browse/FLUME-2124
> Project: Flume
>  Issue Type: Bug
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>    Reporter: wolfgang hoschek
> Fix For: v1.4.1, v1.5.0
>
> Attachments: FLUME-2124-v1.patch
>
>
> Now that CDK 0.4.1 has been released it would be good to upgrade the 
> Morphline Solr Sink to that. Release notes are here: 
> http://cloudera.github.io/cdk/docs/0.4.1/release_notes.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (FLUME-2124) Upgrade Morphline Solr Sink to CDK 0.4.1

2013-07-10 Thread wolfgang hoschek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13705267#comment-13705267
 ] 

wolfgang hoschek commented on FLUME-2124:
-

yes, 0.4.1, sorry.

> Upgrade Morphline Solr Sink to CDK 0.4.1
> 
>
> Key: FLUME-2124
> URL: https://issues.apache.org/jira/browse/FLUME-2124
> Project: Flume
>  Issue Type: Bug
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>Reporter: wolfgang hoschek
> Fix For: v1.4.1, v1.5.0
>
> Attachments: FLUME-2124-v1.patch
>
>
> Now that CDK 0.4.1 has been released it would be good to upgrade the 
> Morphline Solr Sink to that. Release notes are here: 
> http://cloudera.github.io/cdk/docs/0.4.1/release_notes.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (FLUME-2124) Upgrade Morphline Solr Sink to CDK 1.4.1

2013-07-10 Thread wolfgang hoschek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wolfgang hoschek updated FLUME-2124:


Attachment: FLUME-2124-v1.patch

> Upgrade Morphline Solr Sink to CDK 1.4.1
> 
>
> Key: FLUME-2124
> URL: https://issues.apache.org/jira/browse/FLUME-2124
> Project: Flume
>  Issue Type: Bug
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>    Reporter: wolfgang hoschek
> Fix For: v1.4.1, v1.5.0
>
> Attachments: FLUME-2124-v1.patch
>
>
> Now that CDK 1.4.1 has been released it would be good to upgrade the 
> Morphline Solr Sink to that. Release notes are here: 
> http://cloudera.github.io/cdk/docs/0.4.1/release_notes.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (FLUME-2124) Upgrade Morphline Solr Sink to CDK 1.4.1

2013-07-10 Thread wolfgang hoschek (JIRA)
wolfgang hoschek created FLUME-2124:
---

 Summary: Upgrade Morphline Solr Sink to CDK 1.4.1
 Key: FLUME-2124
 URL: https://issues.apache.org/jira/browse/FLUME-2124
 Project: Flume
  Issue Type: Bug
  Components: Sinks+Sources
Affects Versions: v1.4.0
Reporter: wolfgang hoschek
 Fix For: v1.4.1, v1.5.0


Now that CDK 1.4.1 has been released it would be good to upgrade the Morphline 
Solr Sink to that. Release notes are here: 
http://cloudera.github.io/cdk/docs/0.4.1/release_notes.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (FLUME-2123) Morphline Solr sink missing short type name

2013-07-09 Thread wolfgang hoschek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704166#comment-13704166
 ] 

wolfgang hoschek commented on FLUME-2123:
-

Looks good to me. Thanks!

> Morphline Solr sink missing short type name
> ---
>
> Key: FLUME-2123
> URL: https://issues.apache.org/jira/browse/FLUME-2123
> Project: Flume
>  Issue Type: Bug
>Affects Versions: v1.4.0
>Reporter: Roshan Naik
>Assignee: Roshan Naik
> Fix For: v1.4.1, v1.5.0
>
> Attachments: FLUME-2123.patch
>
>
> only FQCN supported for Solr sink

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (FLUME-2105) Add docs for MorphlineSolrSink

2013-06-24 Thread wolfgang hoschek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wolfgang hoschek updated FLUME-2105:


Attachment: FLUME-2105-v2.patch

> Add docs for MorphlineSolrSink
> --
>
> Key: FLUME-2105
> URL: https://issues.apache.org/jira/browse/FLUME-2105
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>    Reporter: wolfgang hoschek
> Fix For: v1.4.0
>
> Attachments: FLUME-2105-v1.patch, FLUME-2105-v2.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


  1   2   >