[jira] [Assigned] (FLUME-2802) Folder name interceptor

2015-09-26 Thread Arvind Prabhakar (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar reassigned FLUME-2802:
---

Assignee: Eran W

> Folder name interceptor
> ---
>
> Key: FLUME-2802
> URL: https://issues.apache.org/jira/browse/FLUME-2802
> Project: Flume
>  Issue Type: New Feature
>Reporter: Eran W
>Assignee: Eran W
> Attachments: FLUME-2802.patch
>
>
> This interceptor retrieve the last folder name from the 
> SpoolDir.fileHeaderKey and set it to the given folderKey.
> This is allow users to set the target hdfs directory based on the source 
> directory and not the whole path or file name. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLUME-2564) Failover processor does not kick-in for HDFS sink on IOException

2014-11-27 Thread Arvind Prabhakar (JIRA)
Arvind Prabhakar created FLUME-2564:
---

 Summary: Failover processor does not kick-in for HDFS sink on 
IOException
 Key: FLUME-2564
 URL: https://issues.apache.org/jira/browse/FLUME-2564
 Project: Flume
  Issue Type: Bug
Reporter: Arvind Prabhakar
Assignee: Arvind Prabhakar


>From a recent thread on the user mailing list:

{quote}
I have investigated the HDFSEventSink source code,  found if the exception was  
IOException , the exception would not throw to the upper layer,
So FailOverSinkProcessor would not mark this sink as dead.
{quote} 

{code}
   
} catch (IOException eIO) {
  transaction.rollback();
  LOG.warn("HDFS IO error", eIO);
  return Status.BACKOFF;
} catch (Throwable th) {
  transaction.rollback();
  LOG.error("process failed", th);
  if (th instanceof Error) {
throw (Error) th;
  } else {
throw new EventDeliveryException(th);
  }
}

{code}

The failover processor should be able to use the backoff signal as indication 
of failure and switch over to the next sink.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2365) Please create a DOAP file for your TLP

2014-06-15 Thread Arvind Prabhakar (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14032013#comment-14032013
 ] 

Arvind Prabhakar commented on FLUME-2365:
-

[~hshreedharan] - I updated the files.xlm in the site repository. Once the 
project shows up correctly on http://projects.apache.org/indexes/alpha.html#F 
we can go ahead and close this Jira out.

> Please create a DOAP file for your TLP
> --
>
> Key: FLUME-2365
> URL: https://issues.apache.org/jira/browse/FLUME-2365
> Project: Flume
>  Issue Type: Task
>Reporter: Sebb
>Assignee: Ashish Paliwal
> Attachments: flume.rdf
>
>
> As per my recent e-mail to your dev list, please can you set up a DOAP for 
> your project and get it added to files.xml?
> Please see http://projects.apache.org/create.html
> Once you have created the DOAP and committed it to your source code 
> repository, please submit it for inclusion in the Apache projects listing as 
> per:
> http://projects.apache.org/create.html#submit
> Remember, if you ever move or rename the doap file in future, please
> ensure that files.xml is updated to point to the new location.
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (FLUME-2191) HDFS Minicluster tests failing after protobuf upgrade.

2013-10-03 Thread Arvind Prabhakar (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785791#comment-13785791
 ] 

Arvind Prabhakar commented on FLUME-2191:
-

+1 changes look good to me. Will commit after a sanity run.

> HDFS Minicluster tests failing after protobuf upgrade.
> --
>
> Key: FLUME-2191
> URL: https://issues.apache.org/jira/browse/FLUME-2191
> Project: Flume
>  Issue Type: Bug
>Reporter: Hari Shreedharan
>Assignee: Hari Shreedharan
>Priority: Blocker
> Attachments: FLUME-2191.patch
>
>
> I ran the full build in hadoop-1 profile, but it looks like the protobuf 
> upgrade broke the hadoop-2 profile. The HDFS Sink test on Minicluster fails 
> with this:
> {code}
> Running org.apache.flume.sink.hdfs.TestHDFSEventSinkOnMiniCluster
> 2013-09-13 12:11:31.159 java[58566:1203] Unable to load realm info from 
> SCDynamicStore
> 2013-09-13 12:11:31.208 java[58566:1203] Unable to load realm info from 
> SCDynamicStore
> Tests run: 4, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: 4.238 sec <<< 
> FAILURE!
> simpleHDFSTest(org.apache.flume.sink.hdfs.TestHDFSEventSinkOnMiniCluster)  
> Time elapsed: 1979 sec  <<< ERROR!
> java.lang.UnsupportedOperationException: This is supposed to be overridden by 
> subclasses.
>   at 
> com.google.protobuf.GeneratedMessage.getUnknownFields(GeneratedMessage.java:180)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetDatanodeReportRequestProto.getSerializedSize(ClientNamenodeProtocolProtos.java:21638)
>   at 
> com.google.protobuf.AbstractMessageLite.toByteString(AbstractMessageLite.java:49)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.constructRpcRequest(ProtobufRpcEngine.java:137)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:181)
>   at com.sun.proxy.$Proxy15.getDatanodeReport(Unknown Source)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:165)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:84)
>   at com.sun.proxy.$Proxy15.getDatanodeReport(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDatanodeReport(ClientNamenodeProtocolTranslatorPB.java:488)
>   at org.apache.hadoop.hdfs.DFSClient.datanodeReport(DFSClient.java:1642)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.waitActive(MiniDFSCluster.java:1703)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.waitActive(MiniDFSCluster.java:1722)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:1066)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:929)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:588)
>   at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:527)
>   at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:398)
>   at 
> org.apache.flume.sink.hdfs.TestHDFSEventSinkOnMiniCluster.simpleHDFSTest(TestHDFSEventSinkOnMiniCluster.java:85)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
>   at org.junit.runners.ParentR

[jira] [Assigned] (FLUME-2192) AbstractSinkProcessor stop incorrectly calls start

2013-10-01 Thread Arvind Prabhakar (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar reassigned FLUME-2192:
---

Assignee: Jeremy Karlson

> AbstractSinkProcessor stop incorrectly calls start
> --
>
> Key: FLUME-2192
> URL: https://issues.apache.org/jira/browse/FLUME-2192
> Project: Flume
>  Issue Type: Bug
>  Components: Sinks+Sources
>Affects Versions: v1.4.0, v1.3.1
>Reporter: Jeremy Karlson
>Assignee: Jeremy Karlson
> Fix For: v1.4.1, v1.5.0
>
> Attachments: FLUME-2192.patch
>
>
> AbstractSinkProcessor incorrectly calls start when trying to stop.  Patch is 
> attached.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (FLUME-2199) Flume builds with new version require mvn install before site can be generated

2013-10-01 Thread Arvind Prabhakar (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13783567#comment-13783567
 ] 

Arvind Prabhakar commented on FLUME-2199:
-

Thanks for the patch Andrew. Do you mind publishing a review request?

> Flume builds with new version require mvn install before site can be generated
> --
>
> Key: FLUME-2199
> URL: https://issues.apache.org/jira/browse/FLUME-2199
> Project: Flume
>  Issue Type: Bug
>  Components: Build
>Affects Versions: v1.4.0
>Reporter: Andrew Bayer
>Assignee: Andrew Bayer
> Fix For: v1.5.0
>
> Attachments: FLUME-2199.patch
>
>
> At this point, if you change the version for Flume, you need to run a mvn 
> install before you can run with -Psite (or, for that matter, javadoc:javadoc) 
> enabled. This is because the top-level POM in flume.git/pom.xml is both the 
> parent POM and the root of the reactor - since it's the parent, it's got to 
> run before any of the children that inherit from it, but site generation 
> should be running *after* all the children, so that it probably pulls in the 
> reactor's build of each child module, rather than having to pull in one 
> already installed/deployed before the build starts.
> There are a bunch of other reasons to split parent POM and top-level POM, but 
> that's the biggest one right there. 
> Also, the javadoc jar generation is a bit messed up - every module's javadoc 
> jar contains not only its own javadocs but the javadocs for every Flume 
> module it depends on. That, again, may make sense in a site context for the 
> top-level, but not for the individual modules. This results in unnecessary 
> bloat in the javadoc jars, and unnecessary time spent downloading the 
> "*-javadoc-resources.jar" for every dependency each module has, due to how 
> the javadoc plugin works. Also the whole site generation per-module thing, 
> which I am not a fan of in most cases. I don't think it's needed here. 
> Tweaking the site plugin not to run anywhere but the top-level and the 
> javadoc plugin to not do the dependency aggregation anywhere but the 
> top-level should make a big difference on build speed.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Assigned] (FLUME-2199) Flume builds with new version require mvn install before site can be generated

2013-10-01 Thread Arvind Prabhakar (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar reassigned FLUME-2199:
---

Assignee: Andrew Bayer

> Flume builds with new version require mvn install before site can be generated
> --
>
> Key: FLUME-2199
> URL: https://issues.apache.org/jira/browse/FLUME-2199
> Project: Flume
>  Issue Type: Bug
>  Components: Build
>Affects Versions: v1.4.0
>Reporter: Andrew Bayer
>Assignee: Andrew Bayer
> Fix For: v1.5.0
>
> Attachments: FLUME-2199.patch
>
>
> At this point, if you change the version for Flume, you need to run a mvn 
> install before you can run with -Psite (or, for that matter, javadoc:javadoc) 
> enabled. This is because the top-level POM in flume.git/pom.xml is both the 
> parent POM and the root of the reactor - since it's the parent, it's got to 
> run before any of the children that inherit from it, but site generation 
> should be running *after* all the children, so that it probably pulls in the 
> reactor's build of each child module, rather than having to pull in one 
> already installed/deployed before the build starts.
> There are a bunch of other reasons to split parent POM and top-level POM, but 
> that's the biggest one right there. 
> Also, the javadoc jar generation is a bit messed up - every module's javadoc 
> jar contains not only its own javadocs but the javadocs for every Flume 
> module it depends on. That, again, may make sense in a site context for the 
> top-level, but not for the individual modules. This results in unnecessary 
> bloat in the javadoc jars, and unnecessary time spent downloading the 
> "*-javadoc-resources.jar" for every dependency each module has, due to how 
> the javadoc plugin works. Also the whole site generation per-module thing, 
> which I am not a fan of in most cases. I don't think it's needed here. 
> Tweaking the site plugin not to run anywhere but the top-level and the 
> javadoc plugin to not do the dependency aggregation anywhere but the 
> top-level should make a big difference on build speed.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (FLUME-2173) Exactly once semantics for Flume

2013-08-28 Thread Arvind Prabhakar (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13753284#comment-13753284
 ] 

Arvind Prabhakar commented on FLUME-2173:
-

(continuing the discussion here instead of email)

Thanks Hari. In the spirit of keeping processing components pluggable, it would 
make sense to have this de-dupe logic pluggable itself. One benefit of doing so 
would be the choice of different implementations that could provide broader 
degree of guarantees. For example, the ZK based approach over the enter 
pipeline could provide complete once-only delivery guarantee but as you pointed 
out could add latency to delivery. Alternatively there could be locally 
optimized implementation of this approach that act on subsets of the event 
stream and thus benefit partitioned deployments where events cannot cross wires.

Another use-case to consider would be to locally optimize for multiple channels 
within the same Agent. That way an Agent that has a File Channel setup as the 
primary channel and a Memory Channel setup as a fall-back channel in case the 
primary is full - would need local deduping without having to store state in ZK.



> Exactly once semantics for Flume
> 
>
> Key: FLUME-2173
> URL: https://issues.apache.org/jira/browse/FLUME-2173
> Project: Flume
>  Issue Type: Bug
>Reporter: Hari Shreedharan
>Assignee: Hari Shreedharan
>
> Currently Flume guarantees only at least once semantics. This jira is meant 
> to track exactly once semantics for Flume. My initial idea is to include uuid 
> event ids on events at the original source (use a config to mark a source an 
> original source) and identify destination sinks. At the destination sinks, 
> use a unique ZK Znode to track the events. If once seen (and configured), 
> pull the duplicate out.
> This might need some refactoring, but my belief is we can do this in a 
> backward compatible way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (FLUME-2140) Support diverting bad events from pipeline

2013-08-07 Thread Arvind Prabhakar (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13732732#comment-13732732
 ] 

Arvind Prabhakar commented on FLUME-2140:
-

[Discussion 
thread|http://flume.markmail.org/thread/y3cks6hdgof3kxu6#query:+page:1+mid:rx3zm53t4dhmqskk+state:results]
 on this subject in the user-list for reference.

> Support diverting bad events from pipeline
> --
>
> Key: FLUME-2140
> URL: https://issues.apache.org/jira/browse/FLUME-2140
> Project: Flume
>  Issue Type: New Feature
>  Components: Node
>Reporter: Arvind Prabhakar
>
> A *bad event* can be any event that causes persistent sink side processing 
> failure due to the inherent nature of the event itself. Note that failures 
> that are not related to the inherent nature of the event such as network 
> communication failure, downstream capacity failure etc., do not make the 
> event a bad-event.
> The presence of a bad event in a channel can cause the entire pipleline to 
> choke and become unusable. Flume should therefore be able to identify bad 
> events and provide a facility to route them out of the pipleline in order to 
> ensure the transport of other events continues uninterrupted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (FLUME-2140) Support diverting bad events from pipeline

2013-08-02 Thread Arvind Prabhakar (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13727912#comment-13727912
 ] 

Arvind Prabhakar commented on FLUME-2140:
-

Another case - a downstream filter is buggy and causes a batch to fail 
repeatedly due to a malformed header or some other details.

> Support diverting bad events from pipeline
> --
>
> Key: FLUME-2140
> URL: https://issues.apache.org/jira/browse/FLUME-2140
> Project: Flume
>  Issue Type: New Feature
>  Components: Node
>Reporter: Arvind Prabhakar
>
> A *bad event* can be any event that causes persistent sink side processing 
> failure due to the inherent nature of the event itself. Note that failures 
> that are not related to the inherent nature of the event such as network 
> communication failure, downstream capacity failure etc., do not make the 
> event a bad-event.
> The presence of a bad event in a channel can cause the entire pipleline to 
> choke and become unusable. Flume should therefore be able to identify bad 
> events and provide a facility to route them out of the pipleline in order to 
> ensure the transport of other events continues uninterrupted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (FLUME-2140) Support diverting bad events from pipeline

2013-08-01 Thread Arvind Prabhakar (JIRA)
Arvind Prabhakar created FLUME-2140:
---

 Summary: Support diverting bad events from pipeline
 Key: FLUME-2140
 URL: https://issues.apache.org/jira/browse/FLUME-2140
 Project: Flume
  Issue Type: New Feature
  Components: Node
Reporter: Arvind Prabhakar


A *bad event* can be any event that causes persistent sink side processing 
failure due to the inherent nature of the event itself. Note that failures that 
are not related to the inherent nature of the event such as network 
communication failure, downstream capacity failure etc., do not make the event 
a bad-event.

The presence of a bad event in a channel can cause the entire pipleline to 
choke and become unusable. Flume should therefore be able to identify bad 
events and provide a facility to route them out of the pipleline in order to 
ensure the transport of other events continues uninterrupted.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (FLUME-1502) Support for running simple configurations embedded in host process

2012-11-05 Thread Arvind Prabhakar (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13491105#comment-13491105
 ] 

Arvind Prabhakar commented on FLUME-1502:
-

@Brock, thanks for the design document. On the point of File Channel, I do feel 
that it is important to have that support to ensure that we do not put 
excessive strain on memory for the host process, and that we do not lose events 
in the case of host process failure.

Another point to consider is whether the source would be any different from a 
regular source when running in embedded mode. For example, does it make sense 
to have embedded agent with a network source like Avro working on it? For 
instance, it may make sense to have no source support, but a direct 
pass-through for the client API that directly talks with the channel in 
question. 

> Support for running simple configurations embedded in host process
> --
>
> Key: FLUME-1502
> URL: https://issues.apache.org/jira/browse/FLUME-1502
> Project: Flume
>  Issue Type: Improvement
>Affects Versions: v1.2.0
>Reporter: Arvind Prabhakar
>Assignee: Brock Noland
> Attachments: embeeded-agent-1.pdf
>
>
> Flume should provide a light-weight embeddable node manager that can be 
> started in process where necessary. This will allow the users to embed 
> light-weight agents within the host process where necessary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (FLUME-1573) Duplicated HDFS file name when multiple SinkRunner was existing

2012-09-28 Thread Arvind Prabhakar (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13465681#comment-13465681
 ] 

Arvind Prabhakar commented on FLUME-1573:
-

@Denny - a sink is an independent, isolated component of Flume. It cannot 
assume any knowledge of other sink(s) operating within the same agent. Having a 
synchronization requirement across multiple sinks breaks this invariant.

However, if within the same sink there are problems due to collisions between 
different bucket writers, that would be a bug and merits fixing. From the 
explanation above that does not seem to be the case to me.

> Duplicated HDFS file name when multiple SinkRunner was existing
> ---
>
> Key: FLUME-1573
> URL: https://issues.apache.org/jira/browse/FLUME-1573
> Project: Flume
>  Issue Type: Bug
>  Components: Sinks+Sources
>Affects Versions: v1.2.0
>Reporter: Denny Ye
>Assignee: Denny Ye
> Fix For: v1.3.0
>
> Attachments: FLUME-1573.patch
>
>
> Multiple HDFS Sinks to write events into storage. Timeout exception is always 
> happening:
> {code:xml}
> 11 Sep 2012 07:04:53,478 WARN  
> [SinkRunner-PollingRunner-DefaultSinkProcessor] 
> (org.apache.flume.sink.hdfs.HDFSEventSink.process:442)  - HDFS IO error
> java.io.IOException: Callable timed out after 1 ms
> at 
> org.apache.flume.sink.hdfs.HDFSEventSink.callWithTimeout(HDFSEventSink.java:342)
> at 
> org.apache.flume.sink.hdfs.HDFSEventSink.append(HDFSEventSink.java:713)
> at 
> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:412)
> at 
> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
> at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: java.util.concurrent.TimeoutException
> at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:228)
> at java.util.concurrent.FutureTask.get(FutureTask.java:91)
> at 
> org.apache.flume.sink.hdfs.HDFSEventSink.callWithTimeout(HDFSEventSink.java:335)
> ... 5 more
> {code}
> I doubted that there might be happened HDFS timeout or slowly response. As 
> expected, I found the duplicated creation exception with same with at HDFS. 
> Also, Flume recorded same case for duplicated file name.
> {code:xml}
> 13 Sep 2012 02:09:35,432 INFO  [hdfs-hdfsSink-3-call-runner-7] 
> (org.apache.flume.sink.hdfs.BucketWriter.doOpen:189)  - Creating 
> /FLUME/dt=2012-09-13/02-host.1347501924111.tmp
> 13 Sep 2012 02:09:36,425 INFO  [hdfs-hdfsSink-4-call-runner-8] 
> (org.apache.flume.sink.hdfs.BucketWriter.doOpen:189)  - Creating 
> /FLUME/dt=2012-09-13/02-host.1347501924111.tmp
> {code}
> Different threads were going to create same file without time conflict.
> I found the root cause might be wrong usage the AtomicLong property named 
> 'fileExtensionCounter' at BucketWriter. Different threads should own same 
> counter by protected with CAS, not multiple private property in each thread. 
> It's useless to avoid conflict of HDFS path

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (FLUME-1502) Support for running simple configurations embedded in host process

2012-08-21 Thread Arvind Prabhakar (JIRA)
Arvind Prabhakar created FLUME-1502:
---

 Summary: Support for running simple configurations embedded in 
host process
 Key: FLUME-1502
 URL: https://issues.apache.org/jira/browse/FLUME-1502
 Project: Flume
  Issue Type: Improvement
Affects Versions: v1.2.0
Reporter: Arvind Prabhakar


Flume should provide a light-weight embeddable node manager that can be started 
in process where necessary. This will allow the users to embed light-weight 
agents within the host process where necessary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (FLUME-1424) File Channel should support encryption

2012-08-07 Thread Arvind Prabhakar (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-1424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429992#comment-13429992
 ] 

Arvind Prabhakar commented on FLUME-1424:
-

Yes, the put records do store the data in them. We can perhaps start with that 
as a first step and if more requirements pop-up, we can address them in 
follow-up Jiras as necessary.

> File Channel should support encryption
> --
>
> Key: FLUME-1424
> URL: https://issues.apache.org/jira/browse/FLUME-1424
> Project: Flume
>  Issue Type: Bug
>Reporter: Arvind Prabhakar
>Assignee: Arvind Prabhakar
>
> When persisting the data to disk, the File Channel should allow some form of 
> encryption to ensure safety of data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (FLUME-1424) File Channel should support encryption

2012-08-03 Thread Arvind Prabhakar (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-1424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13428457#comment-13428457
 ] 

Arvind Prabhakar commented on FLUME-1424:
-

@Ralph - this is definitely one way to address this requirement. The advantage 
(and perhaps a disadvantage at the same time) of this approach is that it will 
only incorporate encryption for the put records. 

Another way to do this is to implement encryption at the LogFile.Writer/Reader 
level where the byte buffers are serialized between transaction boundaries. 
This approach will have a higher performance penalty but would encrypt every 
file channel record regardless of type.


> File Channel should support encryption
> --
>
> Key: FLUME-1424
> URL: https://issues.apache.org/jira/browse/FLUME-1424
> Project: Flume
>  Issue Type: Bug
>Reporter: Arvind Prabhakar
>Assignee: Arvind Prabhakar
>
> When persisting the data to disk, the File Channel should allow some form of 
> encryption to ensure safety of data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (FLUME-1424) File Channel should support encryption

2012-08-02 Thread Arvind Prabhakar (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-1424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar reassigned FLUME-1424:
---

Assignee: Arvind Prabhakar

> File Channel should support encryption
> --
>
> Key: FLUME-1424
> URL: https://issues.apache.org/jira/browse/FLUME-1424
> Project: Flume
>  Issue Type: Bug
>Reporter: Arvind Prabhakar
>Assignee: Arvind Prabhakar
>
> When persisting the data to disk, the File Channel should allow some form of 
> encryption to ensure safety of data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (FLUME-1424) File Channel should support encryption

2012-08-02 Thread Arvind Prabhakar (JIRA)
Arvind Prabhakar created FLUME-1424:
---

 Summary: File Channel should support encryption
 Key: FLUME-1424
 URL: https://issues.apache.org/jira/browse/FLUME-1424
 Project: Flume
  Issue Type: Bug
Reporter: Arvind Prabhakar


When persisting the data to disk, the File Channel should allow some form of 
encryption to ensure safety of data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (FLUME-1380) File channel log can record the op code and not the operation in some cases

2012-07-18 Thread Arvind Prabhakar (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar updated FLUME-1380:


Attachment: FLUME-1380-1.patch

> File channel log can record the op code and not the operation in some cases
> ---
>
> Key: FLUME-1380
> URL: https://issues.apache.org/jira/browse/FLUME-1380
> Project: Flume
>  Issue Type: Bug
>Reporter: Arvind Prabhakar
>Assignee: Arvind Prabhakar
> Attachments: FLUME-1380-1.patch
>
>
> There is a race condition in the system where the log file can record the 
> beginning of a record and be shutdown before the remaining record is written 
> out. This will lead to the system not starting up correctly again with 
> exceptions like:
> {noformat}
> ERROR file.Log: Failed to initialize Log
> java.io.IOException: Header 80808080 not expected value: deadbeef
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (FLUME-1380) File channel log can record the op code and not the operation in some cases

2012-07-18 Thread Arvind Prabhakar (JIRA)
Arvind Prabhakar created FLUME-1380:
---

 Summary: File channel log can record the op code and not the 
operation in some cases
 Key: FLUME-1380
 URL: https://issues.apache.org/jira/browse/FLUME-1380
 Project: Flume
  Issue Type: Bug
Reporter: Arvind Prabhakar
Assignee: Arvind Prabhakar


There is a race condition in the system where the log file can record the 
beginning of a record and be shutdown before the remaining record is written 
out. This will lead to the system not starting up correctly again with 
exceptions like:

{noformat}
ERROR file.Log: Failed to initialize Log
java.io.IOException: Header 80808080 not expected value: deadbeef
{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira