Review Request 18555: HTTP source handler doesn't allow for responses

2014-02-26 Thread Jeremy Karlson

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18555/
---

Review request for Flume.


Bugs: FLUME-2333
https://issues.apache.org/jira/browse/FLUME-2333


Repository: flume-git


Description
---

Add a bidirectional HTTP handler to operate beside the existing one.  This 
shouldn't break any existing functionality.


Diffs
-

  
flume-ng-core/src/main/java/org/apache/flume/source/http/BidirectionalHTTPSourceHandler.java
 PRE-CREATION 
  flume-ng-core/src/main/java/org/apache/flume/source/http/HTTPSource.java 
115b34f 
  flume-ng-core/src/test/java/org/apache/flume/source/http/TestHTTPSource.java 
5b07a6e 
  flume-ng-doc/sphinx/FlumeUserGuide.rst 8390cd2 

Diff: https://reviews.apache.org/r/18555/diff/


Testing
---

Unit testing.


Thanks,

Jeremy Karlson



[jira] [Updated] (FLUME-2333) HTTP source handler doesn't allow for responses

2014-02-26 Thread Jeremy Karlson (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Karlson updated FLUME-2333:
--

Description: 
Existing HTTP source handlers recieve events via a HTTPServletRequest.  This 
works, but because the handler doesn't have access to the HTTPServletResponse, 
there is no ability to return a response.  This makes it unsuitable for some 
sort of protocol that relies on bidirectional communication.

My solution: In addition to the existing HTTPSource interface, I've added a 
BidirectionalHTTPSource interface that is provided the servlet response as a 
parameter.  I've made some changes in the HTTP source allow for both types to 
co-exist, and my changes shouldn't affect anyone who is already using the 
existing interface.

Also includes minor documentation updates to reflect this.

Review: https://reviews.apache.org/r/18555/

  was:
Existing HTTP source handlers recieve events via a HTTPServletRequest.  This 
works, but because the handler doesn't have access to the HTTPServletResponse, 
there is no ability to return a response.  This makes it unsuitable for some 
sort of protocol that relies on bidirectional communication.

My solution: In addition to the existing HTTPSource interface, I've added a 
BidirectionalHTTPSource interface that is provided the servlet response as a 
parameter.  I've made some changes in the HTTP source allow for both types to 
co-exist, and my changes shouldn't affect anyone who is already using the 
existing interface.

Also includes minor documentation updates to reflect this.


> HTTP source handler doesn't allow for responses
> ---
>
> Key: FLUME-2333
> URL: https://issues.apache.org/jira/browse/FLUME-2333
> Project: Flume
>  Issue Type: Improvement
>Reporter: Jeremy Karlson
>Assignee: Jeremy Karlson
> Attachments: FLUME-2333.diff
>
>
> Existing HTTP source handlers recieve events via a HTTPServletRequest.  This 
> works, but because the handler doesn't have access to the 
> HTTPServletResponse, there is no ability to return a response.  This makes it 
> unsuitable for some sort of protocol that relies on bidirectional 
> communication.
> My solution: In addition to the existing HTTPSource interface, I've added a 
> BidirectionalHTTPSource interface that is provided the servlet response as a 
> parameter.  I've made some changes in the HTTP source allow for both types to 
> co-exist, and my changes shouldn't affect anyone who is already using the 
> existing interface.
> Also includes minor documentation updates to reflect this.
> Review: https://reviews.apache.org/r/18555/



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (FLUME-2333) HTTP source handler doesn't allow for responses

2014-02-26 Thread Jeremy Karlson (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Karlson updated FLUME-2333:
--

Attachment: FLUME-2333.diff

> HTTP source handler doesn't allow for responses
> ---
>
> Key: FLUME-2333
> URL: https://issues.apache.org/jira/browse/FLUME-2333
> Project: Flume
>  Issue Type: Improvement
>Reporter: Jeremy Karlson
>Assignee: Jeremy Karlson
> Attachments: FLUME-2333.diff
>
>
> Existing HTTP source handlers recieve events via a HTTPServletRequest.  This 
> works, but because the handler doesn't have access to the 
> HTTPServletResponse, there is no ability to return a response.  This makes it 
> unsuitable for some sort of protocol that relies on bidirectional 
> communication.
> My solution: In addition to the existing HTTPSource interface, I've added a 
> BidirectionalHTTPSource interface that is provided the servlet response as a 
> parameter.  I've made some changes in the HTTP source allow for both types to 
> co-exist, and my changes shouldn't affect anyone who is already using the 
> existing interface.
> Also includes minor documentation updates to reflect this.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (FLUME-2333) HTTP source handler doesn't allow for responses

2014-02-26 Thread Jeremy Karlson (JIRA)
Jeremy Karlson created FLUME-2333:
-

 Summary: HTTP source handler doesn't allow for responses
 Key: FLUME-2333
 URL: https://issues.apache.org/jira/browse/FLUME-2333
 Project: Flume
  Issue Type: Improvement
Reporter: Jeremy Karlson


Existing HTTP source handlers recieve events via a HTTPServletRequest.  This 
works, but because the handler doesn't have access to the HTTPServletResponse, 
there is no ability to return a response.  This makes it unsuitable for some 
sort of protocol that relies on bidirectional communication.

My solution: In addition to the existing HTTPSource interface, I've added a 
BidirectionalHTTPSource interface that is provided the servlet response as a 
parameter.  I've made some changes in the HTTP source allow for both types to 
co-exist, and my changes shouldn't affect anyone who is already using the 
existing interface.

Also includes minor documentation updates to reflect this.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


File Channel Exception "Failed to obtain lock for writing to the log.Try increasing the log write timeout value"

2014-02-26 Thread Mangtani, Kushal
Hi,

I'm using Flume-Ng 1.4 cdh4.4 Tarball for collecting aggregated logs.
I am running a 2 tier(agent,collector) Flume Configuration with custom plugins. 
There are approximately 20 agents (receiving data) and 6 collector flume 
(writing to HDFS) machines all running independenly. However, I have been 
facing some File Channel Exceptions on the collector side. The agent appears to 
be working fine.

 Error  stacktrace:
 org.apache.flume.ChannelException: Failed to 
obtain lock for writing to the log. Try increasing the log write timeout value. 
[channel=c2]
 at 
org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doRollback(FileChannel.java:621)
 at 
org.apache.flume.channel.BasicTransactionSemantics.rollback(BasicTransactionSemantics.java:168)
 at 
org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:421)
 at 
org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
 at 
org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
 .
 And I keep on getting the same error

 P.S :This same exception is repated in most of the 
flume collector machines.But, not at the same duration. There is usually a 
difference of a couple of hours or more.

1.  HDFS sinks are written in  the Amazon EC2 cloud instance.
2. datadir and checkpoint dir of file channel in all flume collector instances 
are mounted to a separate hadoop ebs drive .This makes sure that two separate 
collectors do not overlap their log and checkpoint dir. There is a symbolic 
link i.e /usr/lib/flume-ng/datasource --> /hadoop/ebs/mnt-1
3. The Flume works fine for a couple of days and all the agent,collector are 
initialized properly without exceptions.

Questions:

1.   Exception "Failed to obtain lock for writing to the log. Try 
increasing the log write timeout value . [channel=c2]" . According to the 
documentation, such an exception occurs only if two processes are acceesing the 
same file/directory. However, each channel is configured separately so No two 
channels should access the same dir. Hence, this exception does not indicates 
anything. Please correct me, if im wrong.

2.   Also, HDFS.CallTimeout - indicates calling HDFS for open,write 
operations. If no response within a duration, it timeouts. And , if its 
timeouts; it closes the File. Please correct me, if im wrong.  Also, if there 
is a way to specify the number of retries before it closes the file?

Your inputs/suggestions will be thoroughly appreciated.


Regards
Kushal Mangtani
Software Engineer



Review Request 18544: Hive Streaming sink

2014-02-26 Thread Roshan Naik

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18544/
---

Review request for Flume.


Bugs: FLUME-1734
https://issues.apache.org/jira/browse/FLUME-1734


Repository: flume-git


Description
---

Hive streaming sink.


Diffs
-

  flume-ng-doc/sphinx/FlumeUserGuide.rst 8390cd2 
  flume-ng-sinks/flume-hive-sink/pom.xml PRE-CREATION 
  
flume-ng-sinks/flume-hive-sink/src/main/java/org/apache/flume/sink/hive/HiveSink.java
 PRE-CREATION 
  
flume-ng-sinks/flume-hive-sink/src/main/java/org/apache/flume/sink/hive/HiveWriter.java
 PRE-CREATION 
  flume-ng-sinks/flume-hive-sink/src/test/resources/log4j.properties 
PRE-CREATION 
  flume-ng-sinks/pom.xml 6ac2b4d 
  pom.xml 362fb45 

Diff: https://reviews.apache.org/r/18544/diff/


Testing
---

 This version lacks unit tests.


Thanks,

Roshan Naik



[jira] [Updated] (FLUME-1734) Create a Hive Sink based on the new Hive Streaming support

2014-02-26 Thread Roshan Naik (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated FLUME-1734:
---

Attachment: FLUME-1734.draft.1.patch

Draft patch for review.  No tests currently.

> Create a Hive Sink based on the new Hive Streaming support
> --
>
> Key: FLUME-1734
> URL: https://issues.apache.org/jira/browse/FLUME-1734
> Project: Flume
>  Issue Type: New Feature
>  Components: Sinks+Sources
>Affects Versions: v1.2.0
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>  Labels: features
> Attachments: FLUME-1734.draft.1.patch
>
>
> Create a sink that would stream data into HCatalog partitions. The primary 
> goal being that once the data is loaded into Hadoop, it should be 
> automatically queryable (using say Hive or Pig) without requiring additional 
> post processing steps on behalf of the users. Sink should manage the creation 
> of new partitions and committing them periodically. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (FLUME-2225) Elasticsearch Sink for ES HTTP API

2014-02-26 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13913276#comment-13913276
 ] 

Otis Gospodnetic commented on FLUME-2225:
-

[~hshreedharan] - I see another Guava upgrade to 14.x in FLUME-2286.

> Elasticsearch Sink for ES HTTP API
> --
>
> Key: FLUME-2225
> URL: https://issues.apache.org/jira/browse/FLUME-2225
> Project: Flume
>  Issue Type: New Feature
>Affects Versions: v1.5.0
>Reporter: Otis Gospodnetic
> Fix For: v1.4.1, v1.5.0
>
> Attachments: FLUME-2225-0.patch, FLUME-2225-1.patch
>
>
> Existing ElasticSearchSink uses ES TransportClient.  As such, one cannot use 
> the ES HTTP API, which is sometimes easier, and doesn't have issues around 
> client and server/cluster components using incompatible versions - currently, 
> both client and server/cluster need to be on the same version.
> See
> http://search-hadoop.com/m/k76HH9Te68/otis&subj=Elasticsearch+sink+that+uses+HTTP+API



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Comment Edited] (FLUME-1227) Introduce some sort of SpillableChannel

2014-02-26 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13913186#comment-13913186
 ] 

Otis Gospodnetic edited comment on FLUME-1227 at 2/26/14 5:20 PM:
--

Was just about to write to the ML asking about this functionality.  Looks like 
all known issues have been fixed, plus this is new functionality, so it should 
go in and get some real-world action, which we'd love to give it as soon as 
1.5.0 is out!

+10 for committing this.  Any chances of this going in before 1.5.0 is cut?  
It's got 32 eyeballs watching it, so there is clear interest.



was (Author: otis):
Was just about to write to the ML asking about this functionality.  Looks like 
all known issues have been fixed, plus this is new functionality, so it should 
go in and get some real-world action, which we'd love to give it as soon as 
1.5.0 is out!

+10 for committing this.  Any chances of this going in before 1.5.0 is cut?


> Introduce some sort of SpillableChannel
> ---
>
> Key: FLUME-1227
> URL: https://issues.apache.org/jira/browse/FLUME-1227
> Project: Flume
>  Issue Type: New Feature
>  Components: Channel
>Reporter: Jarek Jarcec Cecho
>Assignee: Roshan Naik
> Attachments: 1227.patch.1, FLUME-1227.v2.patch, FLUME-1227.v5.patch, 
> FLUME-1227.v6.patch, FLUME-1227.v7.patch, FLUME-1227.v8.patch, 
> FLUME-1227.v9.patch, SpillableMemory Channel Design 2.pdf, SpillableMemory 
> Channel Design.pdf
>
>
> I would like to introduce new channel that would behave similarly as scribe 
> (https://github.com/facebook/scribe). It would be something between memory 
> and file channel. Input events would be saved directly to the memory (only) 
> and would be served from there. In case that the memory would be full, we 
> would outsource the events to file.
> Let me describe the use case behind this request. We have plenty of frontend 
> servers that are generating events. We want to send all events to just 
> limited number of machines from where we would send the data to HDFS (some 
> sort of staging layer). Reason for this second layer is our need to decouple 
> event aggregation and front end code to separate machines. Using memory 
> channel is fully sufficient as we can survive lost of some portion of the 
> events. However in order to sustain maintenance windows or networking issues 
> we would have to end up with a lot of memory assigned to those "staging" 
> machines. Referenced "scribe" is dealing with this problem by implementing 
> following logic - events are saved in memory similarly as our MemoryChannel. 
> However in case that the memory gets full (because of maintenance, networking 
> issues, ...) it will spill data to disk where they will be sitting until 
> everything start working again.
> I would like to introduce channel that would implement similar logic. It's 
> durability guarantees would be same as MemoryChannel - in case that someone 
> would remove power cord, this channel would lose data. Based on the 
> discussion in FLUME-1201, I would propose to have the implementation 
> completely independent on any other channel internal code.
> Jarcec



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2014-02-26 Thread Thilo Seidel (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13913190#comment-13913190
 ] 

Thilo Seidel commented on FLUME-1227:
-

Guten Tag,
Ich bin heute nicht im Büro. Ihre Mail wird bis zu meiner Rückkehr weder 
gelesen noch automatisch weitergeleitet.
Viele Grüße
Thilo Seidel


> Introduce some sort of SpillableChannel
> ---
>
> Key: FLUME-1227
> URL: https://issues.apache.org/jira/browse/FLUME-1227
> Project: Flume
>  Issue Type: New Feature
>  Components: Channel
>Reporter: Jarek Jarcec Cecho
>Assignee: Roshan Naik
> Attachments: 1227.patch.1, FLUME-1227.v2.patch, FLUME-1227.v5.patch, 
> FLUME-1227.v6.patch, FLUME-1227.v7.patch, FLUME-1227.v8.patch, 
> FLUME-1227.v9.patch, SpillableMemory Channel Design 2.pdf, SpillableMemory 
> Channel Design.pdf
>
>
> I would like to introduce new channel that would behave similarly as scribe 
> (https://github.com/facebook/scribe). It would be something between memory 
> and file channel. Input events would be saved directly to the memory (only) 
> and would be served from there. In case that the memory would be full, we 
> would outsource the events to file.
> Let me describe the use case behind this request. We have plenty of frontend 
> servers that are generating events. We want to send all events to just 
> limited number of machines from where we would send the data to HDFS (some 
> sort of staging layer). Reason for this second layer is our need to decouple 
> event aggregation and front end code to separate machines. Using memory 
> channel is fully sufficient as we can survive lost of some portion of the 
> events. However in order to sustain maintenance windows or networking issues 
> we would have to end up with a lot of memory assigned to those "staging" 
> machines. Referenced "scribe" is dealing with this problem by implementing 
> following logic - events are saved in memory similarly as our MemoryChannel. 
> However in case that the memory gets full (because of maintenance, networking 
> issues, ...) it will spill data to disk where they will be sitting until 
> everything start working again.
> I would like to introduce channel that would implement similar logic. It's 
> durability guarantees would be same as MemoryChannel - in case that someone 
> would remove power cord, this channel would lose data. Based on the 
> discussion in FLUME-1201, I would propose to have the implementation 
> completely independent on any other channel internal code.
> Jarcec



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2014-02-26 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13913186#comment-13913186
 ] 

Otis Gospodnetic commented on FLUME-1227:
-

Was just about to write to the ML asking about this functionality.  Looks like 
all known issues have been fixed, plus this is new functionality, so it should 
go in and get some real-world action, which we'd love to give it as soon as 
1.5.0 is out!

+10 for committing this.  Any chances of this going in before 1.5.0 is cut?


> Introduce some sort of SpillableChannel
> ---
>
> Key: FLUME-1227
> URL: https://issues.apache.org/jira/browse/FLUME-1227
> Project: Flume
>  Issue Type: New Feature
>  Components: Channel
>Reporter: Jarek Jarcec Cecho
>Assignee: Roshan Naik
> Attachments: 1227.patch.1, FLUME-1227.v2.patch, FLUME-1227.v5.patch, 
> FLUME-1227.v6.patch, FLUME-1227.v7.patch, FLUME-1227.v8.patch, 
> FLUME-1227.v9.patch, SpillableMemory Channel Design 2.pdf, SpillableMemory 
> Channel Design.pdf
>
>
> I would like to introduce new channel that would behave similarly as scribe 
> (https://github.com/facebook/scribe). It would be something between memory 
> and file channel. Input events would be saved directly to the memory (only) 
> and would be served from there. In case that the memory would be full, we 
> would outsource the events to file.
> Let me describe the use case behind this request. We have plenty of frontend 
> servers that are generating events. We want to send all events to just 
> limited number of machines from where we would send the data to HDFS (some 
> sort of staging layer). Reason for this second layer is our need to decouple 
> event aggregation and front end code to separate machines. Using memory 
> channel is fully sufficient as we can survive lost of some portion of the 
> events. However in order to sustain maintenance windows or networking issues 
> we would have to end up with a lot of memory assigned to those "staging" 
> machines. Referenced "scribe" is dealing with this problem by implementing 
> following logic - events are saved in memory similarly as our MemoryChannel. 
> However in case that the memory gets full (because of maintenance, networking 
> issues, ...) it will spill data to disk where they will be sitting until 
> everything start working again.
> I would like to introduce channel that would implement similar logic. It's 
> durability guarantees would be same as MemoryChannel - in case that someone 
> would remove power cord, this channel would lose data. Based on the 
> discussion in FLUME-1201, I would propose to have the implementation 
> completely independent on any other channel internal code.
> Jarcec



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Issue Comment Deleted] (FLUME-2307) Remove Log writetimeout

2014-02-26 Thread Arun (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun updated FLUME-2307:


Comment: was deleted

(was: Is it possible to port this patch to 1.4.0 branch?)

> Remove Log writetimeout
> ---
>
> Key: FLUME-2307
> URL: https://issues.apache.org/jira/browse/FLUME-2307
> Project: Flume
>  Issue Type: Bug
>  Components: Channel
>Affects Versions: v1.4.0
>Reporter: Steve Zesch
>Assignee: Hari Shreedharan
> Fix For: v1.5.0
>
> Attachments: FLUME-2307-1.patch, FLUME-2307.patch
>
>
> I've observed Flume failing to clean up old log data in FileChannels. The 
> amount of old log data can range anywhere from tens to hundreds of GB. I was 
> able to confirm that the channels were in fact empty. This behavior always 
> occurs after lock timeouts when attempting to put, take, rollback, or commit 
> to a FileChannel. Once the timeout occurs, Flume stops cleaning up the old 
> files. I was able to confirm that the Log's writeCheckpoint method was still 
> being called and successfully obtaining a lock from tryLockExclusive(), but I 
> was not able to confirm removeOldLogs being called. The application log did 
> not include "Removing old file: log-xyz" for the old files which the Log 
> class would output if they were correctly being removed. I suspect the lock 
> timeouts were due to high I/O load at the time.
> Some stack traces:
> {code}
> org.apache.flume.ChannelException: Failed to obtain lock for writing to the 
> log. Try increasing the log write timeout value. [channel=fileChannel]
> at 
> org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doPut(FileChannel.java:478)
> at 
> org.apache.flume.channel.BasicTransactionSemantics.put(BasicTransactionSemantics.java:93)
> at 
> org.apache.flume.channel.BasicChannelSemantics.put(BasicChannelSemantics.java:80)
> at 
> org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:189)
> org.apache.flume.ChannelException: Failed to obtain lock for writing to the 
> log. Try increasing the log write timeout value. [channel=fileChannel]
> at 
> org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doCommit(FileChannel.java:594)
> at 
> org.apache.flume.channel.BasicTransactionSemantics.commit(BasicTransactionSemantics.java:151)
> at 
> dataxu.flume.plugins.avro.AsyncAvroSink.process(AsyncAvroSink.java:548)
> at 
> dataxu.flume.plugins.ClassLoaderFlumeSink.process(ClassLoaderFlumeSink.java:33)
> at 
> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
> at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
> at java.lang.Thread.run(Thread.java:619)
> org.apache.flume.ChannelException: Failed to obtain lock for writing to the 
> log. Try increasing the log write timeout value. [channel=fileChannel]
> at 
> org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doRollback(FileChannel.java:621)
> at 
> org.apache.flume.channel.BasicTransactionSemantics.rollback(BasicTransactionSemantics.java:168)
> at 
> org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:194)
> at 
> dataxu.flume.plugins.avro.AvroSource.appendBatch(AvroSource.java:209)
> at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.apache.avro.ipc.specific.SpecificResponder.respond(SpecificResponder.java:91)
> at org.apache.avro.ipc.Responder.respond(Responder.java:151)
> at 
> org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.messageReceived(NettyServer.java:188)
> at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:75)
> at 
> org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:173)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:792)
> at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
> at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:321)
> at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:303)
> at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:220)
> at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUps

[jira] [Commented] (FLUME-2307) Remove Log writetimeout

2014-02-26 Thread Arun (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13912985#comment-13912985
 ] 

Arun commented on FLUME-2307:
-

Is it possible to port this patch to 1.4.0 branch?

> Remove Log writetimeout
> ---
>
> Key: FLUME-2307
> URL: https://issues.apache.org/jira/browse/FLUME-2307
> Project: Flume
>  Issue Type: Bug
>  Components: Channel
>Affects Versions: v1.4.0
>Reporter: Steve Zesch
>Assignee: Hari Shreedharan
> Fix For: v1.5.0
>
> Attachments: FLUME-2307-1.patch, FLUME-2307.patch
>
>
> I've observed Flume failing to clean up old log data in FileChannels. The 
> amount of old log data can range anywhere from tens to hundreds of GB. I was 
> able to confirm that the channels were in fact empty. This behavior always 
> occurs after lock timeouts when attempting to put, take, rollback, or commit 
> to a FileChannel. Once the timeout occurs, Flume stops cleaning up the old 
> files. I was able to confirm that the Log's writeCheckpoint method was still 
> being called and successfully obtaining a lock from tryLockExclusive(), but I 
> was not able to confirm removeOldLogs being called. The application log did 
> not include "Removing old file: log-xyz" for the old files which the Log 
> class would output if they were correctly being removed. I suspect the lock 
> timeouts were due to high I/O load at the time.
> Some stack traces:
> {code}
> org.apache.flume.ChannelException: Failed to obtain lock for writing to the 
> log. Try increasing the log write timeout value. [channel=fileChannel]
> at 
> org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doPut(FileChannel.java:478)
> at 
> org.apache.flume.channel.BasicTransactionSemantics.put(BasicTransactionSemantics.java:93)
> at 
> org.apache.flume.channel.BasicChannelSemantics.put(BasicChannelSemantics.java:80)
> at 
> org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:189)
> org.apache.flume.ChannelException: Failed to obtain lock for writing to the 
> log. Try increasing the log write timeout value. [channel=fileChannel]
> at 
> org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doCommit(FileChannel.java:594)
> at 
> org.apache.flume.channel.BasicTransactionSemantics.commit(BasicTransactionSemantics.java:151)
> at 
> dataxu.flume.plugins.avro.AsyncAvroSink.process(AsyncAvroSink.java:548)
> at 
> dataxu.flume.plugins.ClassLoaderFlumeSink.process(ClassLoaderFlumeSink.java:33)
> at 
> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
> at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
> at java.lang.Thread.run(Thread.java:619)
> org.apache.flume.ChannelException: Failed to obtain lock for writing to the 
> log. Try increasing the log write timeout value. [channel=fileChannel]
> at 
> org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doRollback(FileChannel.java:621)
> at 
> org.apache.flume.channel.BasicTransactionSemantics.rollback(BasicTransactionSemantics.java:168)
> at 
> org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:194)
> at 
> dataxu.flume.plugins.avro.AvroSource.appendBatch(AvroSource.java:209)
> at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.apache.avro.ipc.specific.SpecificResponder.respond(SpecificResponder.java:91)
> at org.apache.avro.ipc.Responder.respond(Responder.java:151)
> at 
> org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.messageReceived(NettyServer.java:188)
> at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:75)
> at 
> org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:173)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:792)
> at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
> at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:321)
> at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:303)
> at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:220)
> at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleU

[jira] [Updated] (FLUME-2331) Large TMP files created and never closed

2014-02-26 Thread Krishna Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar updated FLUME-2331:
-

Attachment: flumelog.zip

Thanks! Attached a zipped log file that contains several hours worth of logging 
data by flume

> Large TMP files created and never closed
> 
>
> Key: FLUME-2331
> URL: https://issues.apache.org/jira/browse/FLUME-2331
> Project: Flume
>  Issue Type: Bug
>  Components: File Channel
>Affects Versions: v1.4.0
>Reporter: Krishna Kumar
> Attachments: flumelog.zip
>
>
> We are currently writing files to Hadoop partitioned by year, month, day via 
> Flume. File rollovers are done every 5 minutes. Recently, we noticed that 
> this file rollover stops happening sometime during the day and that further 
> data is written to an open TMP file. Because there are no further file 
> rollovers, this TMP file becomes very large. At the end of the day, the TMP 
> is not closed also and Flume goes to the next day, creating new files.
> We use a "." prefix to prevent Hive from complaining of the open TMP file. 
> Because of this issue where the TMP file is never closed, the file remains 
> hidden to Hive even after the day ends.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (FLUME-2324) Support writing to multiple HBase clusters using HBaseSink

2014-02-26 Thread Gopinathan A (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13912746#comment-13912746
 ] 

Gopinathan A commented on FLUME-2324:
-

+1 for patch.

Small suggestion , please update this patch with FlumeUserGuide documentation, 
also better to mention configuration precedence. 

> Support writing to multiple HBase clusters using HBaseSink
> --
>
> Key: FLUME-2324
> URL: https://issues.apache.org/jira/browse/FLUME-2324
> Project: Flume
>  Issue Type: Bug
>Reporter: Hari Shreedharan
>Assignee: Hari Shreedharan
> Attachments: FLUME-2324.patch
>
>
> The AsyncHBaseSink can already write to multiple HBase clusters, but 
> HBaseSink cannot. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)