from:"Brock Noland \(JIRA\)"

[jira] [Commented] (FLUME-2624) Improve Hive Sink performance

2015-04-02 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14392926#comment-14392926
 ] 

Brock Noland commented on FLUME-2624:
-

This commit committed metastore_db which should not be committed. Can you 
remove it and add it to gitignore?

> Improve Hive Sink performance
> -
>
> Key: FLUME-2624
> URL: https://issues.apache.org/jira/browse/FLUME-2624
> Project: Flume
>  Issue Type: Bug
>  Components: Sinks+Sources
>Reporter: Deepesh Khandelwal
>Assignee: Deepesh Khandelwal
> Attachments: FLUME-2624.2.patch, FLUME-2624.patch
>
>
> Currently in Hive Sink one record is written at a time to Hive from the 
> batch. Writing multiple records at a time should help.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLUME-1734) Create a Hive Sink based on the new Hive Streaming support

2014-12-18 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252203#comment-14252203
 ] 

Brock Noland commented on FLUME-1734:
-

That sounds fine to me. +1 on the statement of experimental status.

> Create a Hive Sink based on the new Hive Streaming support
> --
>
> Key: FLUME-1734
> URL: https://issues.apache.org/jira/browse/FLUME-1734
> Project: Flume
>  Issue Type: New Feature
>  Components: Sinks+Sources
>Affects Versions: v1.2.0
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>  Labels: features
> Attachments: FLUME-1734.draft.1.patch, FLUME-1734.draft.2.patch, 
> FLUME-1734.v1.patch, FLUME-1734.v2.patch, FLUME-1734.v4.patch, 
> FLUME-1734.v5.patch, FLUME-1734.v6.patch
>
>
> Hive 0.13 has introduced Streaming support which is itself transactional in 
> nature and fits well with Flume's transaction model.
> Short overview of  Hive's  Streaming support on which this sink is based can 
> be found here:
> http://hive.apache.org/javadocs/r0.13.1/api/hcatalog/streaming/index.html
> This jira is for creating a sink that would continuously stream data into 
> Hive tables using the above APIs. The primary goal being that the data 
> streamed by the sink should be instantly queryable (using say Hive or Pig) 
> without requiring additional post-processing steps on behalf of the users. 
> Sink should manage the creation of new partitions periodically if needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLUME-2450) Improve replay index insertion speed.

2014-09-01 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14117824#comment-14117824
 ] 

Brock Noland commented on FLUME-2450:
-

bq.  I had 45gigs of data parked in the file channel , with the patch flume 
took about 25 mins to figure itself out

Could you share how many events were in the queue? Also, was that for a full 
replay? Are you using backup checkpoints?

bq. The frustration right now for us is that our flume nodes are basically 
'down' until this recovery completes.

Are your nodes performing a full recovery often? Are you using backup 
checkpoints? Unless the checkpoint and backpoint checkpoints are gone, a replay 
should be quite fast.

bq. Make a new config option to run the version that requires extending the 
amount of JVM memory

This actually would not improve recovery much. 



> Improve replay index insertion speed.
> -
>
> Key: FLUME-2450
> URL: https://issues.apache.org/jira/browse/FLUME-2450
> Project: Flume
>  Issue Type: Bug
>Reporter: Hari Shreedharan
>Assignee: Hari Shreedharan
> Fix For: v1.6.0
>
> Attachments: FLUME-2450.patch
>
>
> Insertion into the replay index can take long sometimes because we use a file 
> based index and tree set. We should switch this out for a memory mapped db 
> and a hash set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLUME-2450) Improve replay index insertion speed.

2014-08-29 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14115672#comment-14115672
 ] 

Brock Noland commented on FLUME-2450:
-

+1

> Improve replay index insertion speed.
> -
>
> Key: FLUME-2450
> URL: https://issues.apache.org/jira/browse/FLUME-2450
> Project: Flume
>  Issue Type: Bug
>Reporter: Hari Shreedharan
>Assignee: Hari Shreedharan
> Attachments: FLUME-2450.patch
>
>
> Insertion into the replay index can take long sometimes because we use a file 
> based index and tree set. We should switch this out for a memory mapped db 
> and a hash set.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (FLUME-2425) FileChannel should trim data and checkpoint directories

2014-08-22 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107310#comment-14107310
 ] 

Brock Noland commented on FLUME-2425:
-

+1

> FileChannel should trim data and checkpoint directories
> ---
>
> Key: FLUME-2425
> URL: https://issues.apache.org/jira/browse/FLUME-2425
> Project: Flume
>  Issue Type: Bug
>Affects Versions: v1.5.0.1
>Reporter: Brock Noland
> Attachments: FLUME-2425-1.patch, FLUME-2425.patch
>
>
> Today if you place space in the CSV of data dirs you get odd results.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (FLUME-2425) FileChannel should trim data and checkpoint directories

2014-08-22 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107105#comment-14107105
 ] 

Brock Noland commented on FLUME-2425:
-

I think we should use Guava Splitter here to omit empties and trim strings.

> FileChannel should trim data and checkpoint directories
> ---
>
> Key: FLUME-2425
> URL: https://issues.apache.org/jira/browse/FLUME-2425
> Project: Flume
>  Issue Type: Bug
>Affects Versions: v1.5.0.1
>Reporter: Brock Noland
> Attachments: FLUME-2425.patch
>
>
> Today if you place space in the CSV of data dirs you get odd results.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (FLUME-2251) Add support for Kafka Sink

2014-08-04 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14085405#comment-14085405
 ] 

Brock Noland commented on FLUME-2251:
-

Hi,

What's the status on this? 

> Add support for Kafka Sink
> --
>
> Key: FLUME-2251
> URL: https://issues.apache.org/jira/browse/FLUME-2251
> Project: Flume
>  Issue Type: Sub-task
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>Reporter: Ashish Paliwal
>Priority: Minor
> Attachments: FLUME-2251-0.patch, FLUME-2251.patch, FLUME-2251.patch
>
>
> Add support for Kafka Sink



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (FLUME-2425) FileChannel should trim data and checkpoint directories

2014-07-16 Thread Brock Noland (JIRA)

Brock Noland created FLUME-2425:
---

 Summary: FileChannel should trim data and checkpoint directories
 Key: FLUME-2425
 URL: https://issues.apache.org/jira/browse/FLUME-2425
 Project: Flume
  Issue Type: Bug
Reporter: Brock Noland


Today if you place space in the CSV of data dirs you get odd results.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (FLUME-2418) Log4J Appender should have batching option

2014-07-07 Thread Brock Noland (JIRA)

Brock Noland created FLUME-2418:
---

 Summary: Log4J Appender should have batching option
 Key: FLUME-2418
 URL: https://issues.apache.org/jira/browse/FLUME-2418
 Project: Flume
  Issue Type: Improvement
Reporter: Brock Noland


Today the log4j appender is a single event -> single request to Avro Source. 
This performs terribly, we should allow for an optional queue + batch.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (FLUME-2245) HDFS files with errors unable to close

2014-05-15 Thread Brock Noland (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLUME-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated FLUME-2245:


Attachment: FLUME-2245.patch

Attached is the patch which fixed the issue for me.

> HDFS files with errors unable to close
> --
>
> Key: FLUME-2245
> URL: https://issues.apache.org/jira/browse/FLUME-2245
> Project: Flume
>  Issue Type: Bug
>Reporter: Juhani Connolly
> Attachments: FLUME-2245.patch, flume.log.1133, flume.log.file
>
>
> This  is running on a snapshot of Flume-1.5 with the git hash 
> 99db32ccd163daf9d7685f0e8485941701e1133d
> When a datanode goes unresponsive for a significant amount of time(for 
> example a big gc) an append failure will occur followed by repeated time outs 
> appearing in the log, and failure to close the stream. Relevant section of 
> logs attached(where it first starts appearing.
> The same log repeats periodically, consistently running into a 
> TimeoutException.
> Restarting  flume(or presumably just the HDFSSink) solves the issue.
> Probable cause in comments



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (FLUME-2383) Add option to enable agent exit if a component fails to start

2014-05-14 Thread Brock Noland (JIRA)

Brock Noland created FLUME-2383:
---

 Summary: Add option to enable agent exit if a component fails to 
start
 Key: FLUME-2383
 URL: https://issues.apache.org/jira/browse/FLUME-2383
 Project: Flume
  Issue Type: Improvement
Reporter: Brock Noland


Some users do not like that our agent continues to run despite a failed 
component. We should have an option so users can tell the flume agent to exit 
if any component fails to start.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (FLUME-2245) HDFS files with errors unable to close

2014-05-13 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997254#comment-13997254
 ] 

Brock Noland commented on FLUME-2245:
-

Hi,

Yes, I was able to test this the BucketWriter writer change (and only that) and 
I found it fixed this issue.

Note: I used kill -STOP on the DN to reproduce.

> HDFS files with errors unable to close
> --
>
> Key: FLUME-2245
> URL: https://issues.apache.org/jira/browse/FLUME-2245
> Project: Flume
>  Issue Type: Bug
>Reporter: Juhani Connolly
> Attachments: flume.log.1133, flume.log.file
>
>
> This  is running on a snapshot of Flume-1.5 with the git hash 
> 99db32ccd163daf9d7685f0e8485941701e1133d
> When a datanode goes unresponsive for a significant amount of time(for 
> example a big gc) an append failure will occur followed by repeated time outs 
> appearing in the log, and failure to close the stream. Relevant section of 
> logs attached(where it first starts appearing.
> The same log repeats periodically, consistently running into a 
> TimeoutException.
> Restarting  flume(or presumably just the HDFSSink) solves the issue.
> Probable cause in comments



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (FLUME-2245) HDFS files with errors unable to close

2014-05-13 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997278#comment-13997278
 ] 

Brock Noland commented on FLUME-2245:
-

Note that it's just the try and catch on the flush.

> HDFS files with errors unable to close
> --
>
> Key: FLUME-2245
> URL: https://issues.apache.org/jira/browse/FLUME-2245
> Project: Flume
>  Issue Type: Bug
>Reporter: Juhani Connolly
> Attachments: FLUME-2245.patch, flume.log.1133, flume.log.file
>
>
> This  is running on a snapshot of Flume-1.5 with the git hash 
> 99db32ccd163daf9d7685f0e8485941701e1133d
> When a datanode goes unresponsive for a significant amount of time(for 
> example a big gc) an append failure will occur followed by repeated time outs 
> appearing in the log, and failure to close the stream. Relevant section of 
> logs attached(where it first starts appearing.
> The same log repeats periodically, consistently running into a 
> TimeoutException.
> Restarting  flume(or presumably just the HDFSSink) solves the issue.
> Probable cause in comments



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (FLUME-2351) Ability to override any parameter from the configuration file

2014-05-12 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995352#comment-13995352
 ] 

Brock Noland commented on FLUME-2351:
-

Generally I think this is ok. I don't see it documented in the flume-ng 
command? Also the link to RB is wrong.

> Ability to override any parameter from the configuration file
> -
>
> Key: FLUME-2351
> URL: https://issues.apache.org/jira/browse/FLUME-2351
> Project: Flume
>  Issue Type: Improvement
>  Components: Node
>Affects Versions: v1.5.0
>Reporter: Krisztian Horvath
>Priority: Minor
>  Labels: features, patch
> Fix For: v1.5.0
>
> Attachments: FLUME-2351.patch
>
>
> To start flume agents dynamically it comes handy to be able to override 
> parameters of a base configuration file without actually touching and 
> modifying it, for example change the bind port.
> Example:
> agent.sources.avro-collection-source.bind = localhost
> agent.sources.avro-collection-source.port = 5
> agent.sinks.hdfs-sink.hdfs.path = hdfs://localhost:9000/flume
> agent.channels.memoryChannel.capacity = 2
> flume-ng agent -n agent -f flume.conf 
> -o avro-collection-source.bind=0.0.0.0 
> -o avro-collection-source.port=3
> -o hdfs-sink.hdfs.path=hdfs://localhost:9000/data
> -o memoryChannel.capacity=3



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (FLUME-2181) Optionally disable File Channel fsyncs

2014-05-01 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13986650#comment-13986650
 ] 

Brock Noland commented on FLUME-2181:
-

+1

Thank you Hari!  I will run tests and commit as soon as I can.

> Optionally disable File Channel fsyncs 
> ---
>
> Key: FLUME-2181
> URL: https://issues.apache.org/jira/browse/FLUME-2181
> Project: Flume
>  Issue Type: Improvement
>Reporter: Hari Shreedharan
>Assignee: Hari Shreedharan
> Attachments: FLUME-2181-1.patch, FLUME-2181-2.patch, 
> FLUME-2181-3.patch, FLUME-2181.patch
>
>
> This will give File Channel performance a big boost, at the cost of possible 
> data loss if a crash happens between checkpoints. 
> Also we should make it configurable, with default to false. If the user does 
> not mind slight inconsistencies, this feature can be explicitly enabled 
> through configuration. So if it is not configured, then the behavior will be 
> exactly as it is now.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (FLUME-2181) Optionally disable File Channel fsyncs

2014-04-10 Thread Brock Noland (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLUME-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated FLUME-2181:


Attachment: FLUME-2181-2.patch

[~hshreedharan] I had some spare time so I rebased the patch on trunk. The 
tests pass as well.

{noformat}
[INFO] Apache Flume .. SUCCESS [1.074s]
[INFO] Flume NG SDK .. SUCCESS [1:15.330s]
[INFO] Flume NG Configuration  SUCCESS [1.417s]
[INFO] Flume NG Core . SUCCESS [6:45.411s]
[INFO] Flume NG Sinks  SUCCESS [0.060s]
[INFO] Flume NG HDFS Sink  SUCCESS [1:20.956s]
[INFO] Flume NG IRC Sink . SUCCESS [1.247s]
[INFO] Flume NG Channels . SUCCESS [0.067s]
[INFO] Flume NG JDBC channel . SUCCESS [2:07.739s]
[INFO] Flume NG file-based channel ... SUCCESS [11:07.351s]
[INFO] Flume NG Spillable Memory channel . SUCCESS [3:44.141s]
[INFO] Flume NG Node . SUCCESS [27.442s]
[INFO] Flume NG Embedded Agent ... SUCCESS [11.879s]
[INFO] Flume NG HBase Sink ... SUCCESS [3:18.274s]
[INFO] Flume NG ElasticSearch Sink ... SUCCESS [42.384s]
[INFO] Flume NG Morphline Solr Sink .. SUCCESS [13.304s]
[INFO] Flume Sources . SUCCESS [0.030s]
[INFO] Flume Scribe Source ... SUCCESS [0.601s]
[INFO] Flume JMS Source .. SUCCESS [13.362s]
[INFO] Flume Twitter Source .. SUCCESS [0.906s]
[INFO] Flume legacy Sources .. SUCCESS [0.028s]
[INFO] Flume legacy Avro source .. SUCCESS [1.510s]
[INFO] Flume legacy Thrift Source  SUCCESS [2.302s]
[INFO] Flume NG Clients .. SUCCESS [0.026s]
[INFO] Flume NG Log4j Appender ... SUCCESS [23.146s]
[INFO] Flume NG Tools  SUCCESS [6.739s]
[INFO] Flume NG distribution . SUCCESS [7.759s]
[INFO] Flume NG Integration Tests  SUCCESS [53.966s]
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 33:08.887s
[INFO] Finished at: Thu Apr 10 18:19:13 CDT 2014
[INFO] Final Memory: 267M/973M
[INFO] 
{noformat}

> Optionally disable File Channel fsyncs 
> ---
>
> Key: FLUME-2181
> URL: https://issues.apache.org/jira/browse/FLUME-2181
> Project: Flume
>  Issue Type: Improvement
>Reporter: Hari Shreedharan
>Assignee: Hari Shreedharan
> Attachments: FLUME-2181-1.patch, FLUME-2181-2.patch, FLUME-2181.patch
>
>
> This will give File Channel performance a big boost, at the cost of possible 
> data loss if a crash happens between checkpoints. 
> Also we should make it configurable, with default to false. If the user does 
> not mind slight inconsistencies, this feature can be explicitly enabled 
> through configuration. So if it is not configured, then the behavior will be 
> exactly as it is now.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (FLUME-2078) FileChannel periodic fsync should be supported

2014-04-03 Thread Brock Noland (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLUME-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland resolved FLUME-2078.
-

Resolution: Duplicate

> FileChannel periodic fsync should be supported
> --
>
> Key: FLUME-2078
> URL: https://issues.apache.org/jira/browse/FLUME-2078
> Project: Flume
>  Issue Type: New Feature
>  Components: File Channel
>Reporter: Brock Noland
>
> It would be nice to have the option to do a periodic fsync as opposed to 
> fsync on every commit. This would be an option and would be disabled by 
> default.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (FLUME-2181) Optionally disable File Channel fsyncs

2014-04-03 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959591#comment-13959591
 ] 

Brock Noland commented on FLUME-2181:
-

[~hshreedharan] can you post an updated patch? I'd like to get this in.

> Optionally disable File Channel fsyncs 
> ---
>
> Key: FLUME-2181
> URL: https://issues.apache.org/jira/browse/FLUME-2181
> Project: Flume
>  Issue Type: Improvement
>Reporter: Hari Shreedharan
>Assignee: Hari Shreedharan
> Attachments: FLUME-2181-1.patch, FLUME-2181.patch
>
>
> This will give File Channel performance a big boost, at the cost of possible 
> data loss if a crash happens between checkpoints. 
> Also we should make it configurable, with default to false. If the user does 
> not mind slight inconsistencies, this feature can be explicitly enabled 
> through configuration. So if it is not configured, then the behavior will be 
> exactly as it is now.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (FLUME-2347) Add FLUME_JAVA_OPTS which allows users to inject java properties from cmd line

2014-03-20 Thread Brock Noland (JIRA)

Brock Noland created FLUME-2347:
---

 Summary: Add FLUME_JAVA_OPTS which allows users to inject java 
properties from cmd line
 Key: FLUME-2347
 URL: https://issues.apache.org/jira/browse/FLUME-2347
 Project: Flume
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: FLUME-2347.patch

In order to set java properties such as -X, -D, and -javaagent we have teh 
following:

* flume-ng takes -X and -D as native properties
* JAVA_OPTS can be placed in the flume-env.sh file

However, there is no way to set properties on the command line which do not 
start with -X or -D.

eg.

env JAVA_OPTS="-javaagent" flume-ng 

Therefore I suggest we introduce FLUME_JAVA_OPTS which sets properties from the 
env starting the flume-ng command. This will not impact users who use JAVA_OPTs 
in the non-flume environment incompatibly with flume.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (FLUME-2347) Add FLUME_JAVA_OPTS which allows users to inject java properties from cmd line

2014-03-20 Thread Brock Noland (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLUME-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated FLUME-2347:


Attachment: FLUME-2347.patch

> Add FLUME_JAVA_OPTS which allows users to inject java properties from cmd line
> --
>
> Key: FLUME-2347
> URL: https://issues.apache.org/jira/browse/FLUME-2347
> Project: Flume
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Brock Noland
> Attachments: FLUME-2347.patch
>
>
> In order to set java properties such as -X, -D, and -javaagent we have teh 
> following:
> * flume-ng takes -X and -D as native properties
> * JAVA_OPTS can be placed in the flume-env.sh file
> However, there is no way to set properties on the command line which do not 
> start with -X or -D.
> eg.
> env JAVA_OPTS="-javaagent" flume-ng 
> Therefore I suggest we introduce FLUME_JAVA_OPTS which sets properties from 
> the env starting the flume-ng command. This will not impact users who use 
> JAVA_OPTs in the non-flume environment incompatibly with flume.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (FLUME-2346) idLogFileMap in Log can lose track of file ids

2014-03-17 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13938060#comment-13938060
 ] 

Brock Noland commented on FLUME-2346:
-

Good catch! FYI [~hshreedharan]

> idLogFileMap in Log can lose track of file ids
> --
>
> Key: FLUME-2346
> URL: https://issues.apache.org/jira/browse/FLUME-2346
> Project: Flume
>  Issue Type: Bug
>  Components: File Channel
>Affects Versions: v1.4.0
>Reporter: Steve Zesch
>
> The following code from Log's writeCheckpoint method can lose track of file 
> ids to Reader mappings which will lead to a NPE being thrown in subsequent 
> calls to writeCheckpoint.
> {code:title=Log#writeCheckpoint}
> Iterator idIterator = logFileRefCountsAll.iterator();
> while (idIterator.hasNext()) {
> int id = idIterator.next();
> LogFile.RandomReader reader = idLogFileMap.remove(id);
> File file = reader.getFile();
> reader.close();
> LogFile.MetaDataWriter writer =
> LogFileFactory.getMetaDataWriter(file, id);
> try {
> writer.markCheckpoint(logWriteOrderID);
>  } finally {
> writer.close();
> }
> reader = LogFileFactory.getRandomReader(file, encryptionKeyProvider);
> idLogFileMap.put(id, reader);
> LOGGER.debug("Updated checkpoint for file: " + file
> + "logWriteOrderID " + logWriteOrderID);
> idIterator.remove();
> }
> {code}
> The problem occurs when writer.markCheckpoint throws an exception and the id 
> -> reader mapping is not added back to idLogFileMap. The next time 
> writeCheckpoint is called logFileRefCountsAll still contains the file id, but 
> idLogFileMap.remove(id) returns null since the id is no longer in the map. 
> Attempting to call reader.getFile() then throws a NPE.
> Is there a reason that the initial reader obtained from idLogFileMap is 
> closed and then a new reader for the same file is later created an inserted 
> into the map? If that is not required, then one possible solution would be to 
> remove this logic and not remove the id -> reader mapping in idLogFileMap. If 
> that logic is required, then perhaps the code to insert a new id -> reader 
> mapping into idLogFileMap could be added to the finally block.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (FLUME-2342) Document FLUME-2311 - Use standard way of finding queue/topic

2014-03-05 Thread Brock Noland (JIRA)

Brock Noland created FLUME-2342:
---

 Summary: Document FLUME-2311 - Use standard way of finding 
queue/topic
 Key: FLUME-2342
 URL: https://issues.apache.org/jira/browse/FLUME-2342
 Project: Flume
  Issue Type: Bug
Reporter: Brock Noland


FLUME-2311 a critical fix but it was not documented.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (FLUME-2307) Remove Log writetimeout

2014-02-10 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896990#comment-13896990
 ] 

Brock Noland commented on FLUME-2307:
-

+1

I cannot test/commit as I am on an airplane.

> Remove Log writetimeout
> ---
>
> Key: FLUME-2307
> URL: https://issues.apache.org/jira/browse/FLUME-2307
> Project: Flume
>  Issue Type: Bug
>  Components: Channel
>Affects Versions: v1.4.0
>Reporter: Steve Zesch
>Assignee: Hari Shreedharan
> Attachments: FLUME-2307-1.patch, FLUME-2307.patch
>
>
> I've observed Flume failing to clean up old log data in FileChannels. The 
> amount of old log data can range anywhere from tens to hundreds of GB. I was 
> able to confirm that the channels were in fact empty. This behavior always 
> occurs after lock timeouts when attempting to put, take, rollback, or commit 
> to a FileChannel. Once the timeout occurs, Flume stops cleaning up the old 
> files. I was able to confirm that the Log's writeCheckpoint method was still 
> being called and successfully obtaining a lock from tryLockExclusive(), but I 
> was not able to confirm removeOldLogs being called. The application log did 
> not include "Removing old file: log-xyz" for the old files which the Log 
> class would output if they were correctly being removed. I suspect the lock 
> timeouts were due to high I/O load at the time.
> Some stack traces:
> org.apache.flume.ChannelException: Failed to obtain lock for writing to the 
> log. Try increasing the log write timeout value. [channel=fileChannel]
> at 
> org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doPut(FileChannel.java:478)
> at 
> org.apache.flume.channel.BasicTransactionSemantics.put(BasicTransactionSemantics.java:93)
> at 
> org.apache.flume.channel.BasicChannelSemantics.put(BasicChannelSemantics.java:80)
> at 
> org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:189)
> org.apache.flume.ChannelException: Failed to obtain lock for writing to the 
> log. Try increasing the log write timeout value. [channel=fileChannel]
> at 
> org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doCommit(FileChannel.java:594)
> at 
> org.apache.flume.channel.BasicTransactionSemantics.commit(BasicTransactionSemantics.java:151)
> at 
> dataxu.flume.plugins.avro.AsyncAvroSink.process(AsyncAvroSink.java:548)
> at 
> dataxu.flume.plugins.ClassLoaderFlumeSink.process(ClassLoaderFlumeSink.java:33)
> at 
> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
> at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
> at java.lang.Thread.run(Thread.java:619)
> org.apache.flume.ChannelException: Failed to obtain lock for writing to the 
> log. Try increasing the log write timeout value. [channel=fileChannel]
> at 
> org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doRollback(FileChannel.java:621)
> at 
> org.apache.flume.channel.BasicTransactionSemantics.rollback(BasicTransactionSemantics.java:168)
> at 
> org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:194)
> at 
> dataxu.flume.plugins.avro.AvroSource.appendBatch(AvroSource.java:209)
> at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.apache.avro.ipc.specific.SpecificResponder.respond(SpecificResponder.java:91)
> at org.apache.avro.ipc.Responder.respond(Responder.java:151)
> at 
> org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.messageReceived(NettyServer.java:188)
> at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:75)
> at 
> org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:173)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:792)
> at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
> at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:321)
> at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:303)
> at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:220)
> at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstre

[jira] [Updated] (FLUME-2311) Use standard way of finding queue/topic

2014-02-07 Thread Brock Noland (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLUME-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated FLUME-2311:


Fix Version/s: v1.5.0
 Assignee: Hugo Lassiège

I committed this to trunk and 1.5! Thank you very much Hugo! This was a great 
contribution.

> Use standard way of finding queue/topic
> ---
>
> Key: FLUME-2311
> URL: https://issues.apache.org/jira/browse/FLUME-2311
> Project: Flume
>  Issue Type: Bug
>Affects Versions: v1.4.0
>Reporter: Brock Noland
>Assignee: Hugo Lassiège
>  Labels: flume, jms, patch
> Fix For: v1.5.0
>
> Attachments: 0001-Patch-for-FLUME-2311.patch, 
> 0001-Patch-for-FLUME-2311.patch
>
>
> Here 
> https://issues.apache.org/jira/browse/FLUME-924?focusedCommentId=13890651&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13890651
> [~hlassiege] says:
> I'm currently using this jms source to connect on Weblogic message bus. I'm 
> wondering why the JMSMessageConsumer use createQueue and createTopic instead 
> of lookup to find the destinations (line 83 to 90). 
> It seems that "createQueue" or "createTopic" are not the recommended way 
> because it is not portable (I saw that warning in Weblogic documentation even 
> if I can't justify this assertion). 
> The documentation recommends to use a JNDI lookup 
> (http://docs.oracle.com/cd/E23943_01/web./e13727/lookup.htm#BABDFCIC).
> Is there any reason to use createQueue instead of lookup ?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (FLUME-2311) Use standard way of finding queue/topic

2014-02-06 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13893456#comment-13893456
 ] 

Brock Noland commented on FLUME-2311:
-

Nice! This patch generally looks great! The only issue I see is that the patch 
uses tabs (or 4 spaces) as opposed to 2 spaces. Would you mind re formatting 
the patch?

> Use standard way of finding queue/topic
> ---
>
> Key: FLUME-2311
> URL: https://issues.apache.org/jira/browse/FLUME-2311
> Project: Flume
>  Issue Type: Bug
>Affects Versions: v1.4.0
>Reporter: Brock Noland
>  Labels: flume, jms, patch
> Attachments: 0001-Patch-for-FLUME-2311.patch
>
>
> Here 
> https://issues.apache.org/jira/browse/FLUME-924?focusedCommentId=13890651&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13890651
> [~hlassiege] says:
> I'm currently using this jms source to connect on Weblogic message bus. I'm 
> wondering why the JMSMessageConsumer use createQueue and createTopic instead 
> of lookup to find the destinations (line 83 to 90). 
> It seems that "createQueue" or "createTopic" are not the recommended way 
> because it is not portable (I saw that warning in Weblogic documentation even 
> if I can't justify this assertion). 
> The documentation recommends to use a JNDI lookup 
> (http://docs.oracle.com/cd/E23943_01/web./e13727/lookup.htm#BABDFCIC).
> Is there any reason to use createQueue instead of lookup ?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (FLUME-2307) Old log data is not cleaned up

2014-02-05 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13892893#comment-13892893
 ] 

Brock Noland commented on FLUME-2307:
-

bq. I have wondered for some time what purpose is it actually serving?

To add to my own question...as soon as the timeout occurs the client is going 
to move on to the next action be it put or take sit waiting in the lock again. 
That is the timeout doesn't improve the speed at which the lock requiring 
action occurs.

> Old log data is not cleaned up
> --
>
> Key: FLUME-2307
> URL: https://issues.apache.org/jira/browse/FLUME-2307
> Project: Flume
>  Issue Type: Bug
>  Components: Channel
>Affects Versions: v1.4.0
>Reporter: Steve Zesch
>
> I've observed Flume failing to clean up old log data in FileChannels. The 
> amount of old log data can range anywhere from tens to hundreds of GB. I was 
> able to confirm that the channels were in fact empty. This behavior always 
> occurs after lock timeouts when attempting to put, take, rollback, or commit 
> to a FileChannel. Once the timeout occurs, Flume stops cleaning up the old 
> files. I was able to confirm that the Log's writeCheckpoint method was still 
> being called and successfully obtaining a lock from tryLockExclusive(), but I 
> was not able to confirm removeOldLogs being called. The application log did 
> not include "Removing old file: log-xyz" for the old files which the Log 
> class would output if they were correctly being removed. I suspect the lock 
> timeouts were due to high I/O load at the time.
> Some stack traces:
> org.apache.flume.ChannelException: Failed to obtain lock for writing to the 
> log. Try increasing the log write timeout value. [channel=fileChannel]
> at 
> org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doPut(FileChannel.java:478)
> at 
> org.apache.flume.channel.BasicTransactionSemantics.put(BasicTransactionSemantics.java:93)
> at 
> org.apache.flume.channel.BasicChannelSemantics.put(BasicChannelSemantics.java:80)
> at 
> org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:189)
> org.apache.flume.ChannelException: Failed to obtain lock for writing to the 
> log. Try increasing the log write timeout value. [channel=fileChannel]
> at 
> org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doCommit(FileChannel.java:594)
> at 
> org.apache.flume.channel.BasicTransactionSemantics.commit(BasicTransactionSemantics.java:151)
> at 
> dataxu.flume.plugins.avro.AsyncAvroSink.process(AsyncAvroSink.java:548)
> at 
> dataxu.flume.plugins.ClassLoaderFlumeSink.process(ClassLoaderFlumeSink.java:33)
> at 
> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
> at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
> at java.lang.Thread.run(Thread.java:619)
> org.apache.flume.ChannelException: Failed to obtain lock for writing to the 
> log. Try increasing the log write timeout value. [channel=fileChannel]
> at 
> org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doRollback(FileChannel.java:621)
> at 
> org.apache.flume.channel.BasicTransactionSemantics.rollback(BasicTransactionSemantics.java:168)
> at 
> org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:194)
> at 
> dataxu.flume.plugins.avro.AvroSource.appendBatch(AvroSource.java:209)
> at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.apache.avro.ipc.specific.SpecificResponder.respond(SpecificResponder.java:91)
> at org.apache.avro.ipc.Responder.respond(Responder.java:151)
> at 
> org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.messageReceived(NettyServer.java:188)
> at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:75)
> at 
> org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:173)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:792)
> at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
> at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:321)
> at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:303)
> at 
> org.jbo

[jira] [Commented] (FLUME-2307) Old log data is not cleaned up

2014-02-05 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13892648#comment-13892648
 ] 

Brock Noland commented on FLUME-2307:
-

Can we eliminate the write timeout? I have wondered for some time what purpose 
is it actually serving?

> Old log data is not cleaned up
> --
>
> Key: FLUME-2307
> URL: https://issues.apache.org/jira/browse/FLUME-2307
> Project: Flume
>  Issue Type: Bug
>  Components: Channel
>Affects Versions: v1.4.0
>Reporter: Steve Zesch
>
> I've observed Flume failing to clean up old log data in FileChannels. The 
> amount of old log data can range anywhere from tens to hundreds of GB. I was 
> able to confirm that the channels were in fact empty. This behavior always 
> occurs after lock timeouts when attempting to put, take, rollback, or commit 
> to a FileChannel. Once the timeout occurs, Flume stops cleaning up the old 
> files. I was able to confirm that the Log's writeCheckpoint method was still 
> being called and successfully obtaining a lock from tryLockExclusive(), but I 
> was not able to confirm removeOldLogs being called. The application log did 
> not include "Removing old file: log-xyz" for the old files which the Log 
> class would output if they were correctly being removed. I suspect the lock 
> timeouts were due to high I/O load at the time.
> Some stack traces:
> org.apache.flume.ChannelException: Failed to obtain lock for writing to the 
> log. Try increasing the log write timeout value. [channel=fileChannel]
> at 
> org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doPut(FileChannel.java:478)
> at 
> org.apache.flume.channel.BasicTransactionSemantics.put(BasicTransactionSemantics.java:93)
> at 
> org.apache.flume.channel.BasicChannelSemantics.put(BasicChannelSemantics.java:80)
> at 
> org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:189)
> org.apache.flume.ChannelException: Failed to obtain lock for writing to the 
> log. Try increasing the log write timeout value. [channel=fileChannel]
> at 
> org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doCommit(FileChannel.java:594)
> at 
> org.apache.flume.channel.BasicTransactionSemantics.commit(BasicTransactionSemantics.java:151)
> at 
> dataxu.flume.plugins.avro.AsyncAvroSink.process(AsyncAvroSink.java:548)
> at 
> dataxu.flume.plugins.ClassLoaderFlumeSink.process(ClassLoaderFlumeSink.java:33)
> at 
> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
> at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
> at java.lang.Thread.run(Thread.java:619)
> org.apache.flume.ChannelException: Failed to obtain lock for writing to the 
> log. Try increasing the log write timeout value. [channel=fileChannel]
> at 
> org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doRollback(FileChannel.java:621)
> at 
> org.apache.flume.channel.BasicTransactionSemantics.rollback(BasicTransactionSemantics.java:168)
> at 
> org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:194)
> at 
> dataxu.flume.plugins.avro.AvroSource.appendBatch(AvroSource.java:209)
> at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.apache.avro.ipc.specific.SpecificResponder.respond(SpecificResponder.java:91)
> at org.apache.avro.ipc.Responder.respond(Responder.java:151)
> at 
> org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.messageReceived(NettyServer.java:188)
> at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:75)
> at 
> org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:173)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:792)
> at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
> at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:321)
> at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:303)
> at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:220)
> at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:75)
> at

[jira] [Commented] (FLUME-2277) Improve FileChannel documentation to address commons support issues

2014-02-05 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13892647#comment-13892647
 ] 

Brock Noland commented on FLUME-2277:
-

[~roshan_naik]

Why do we feel 10 seconds is high?  I actually wish we had put it at some 
number quite large like a five minutes or something. What is the harm in having 
a high write timeout?

In other news this bug FLUME-2307 was caused in part by write timeouts.

> Improve FileChannel documentation to address commons support issues
> ---
>
> Key: FLUME-2277
> URL: https://issues.apache.org/jira/browse/FLUME-2277
> Project: Flume
>  Issue Type: Task
>Reporter: Brock Noland
>Assignee: Brock Noland
> Attachments: FLUME-2277.patch
>
>
> Often users configure too small of batch size with File Channel, use sources 
> such as Exec source which generate small batches, or do not configure 
> multiple disks.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (FLUME-2311) Use standard way of finding queue/topic

2014-02-05 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13892641#comment-13892641
 ] 

Brock Noland commented on FLUME-2311:
-

Thank you!!

> Use standard way of finding queue/topic
> ---
>
> Key: FLUME-2311
> URL: https://issues.apache.org/jira/browse/FLUME-2311
> Project: Flume
>  Issue Type: Bug
>Reporter: Brock Noland
>
> Here 
> https://issues.apache.org/jira/browse/FLUME-924?focusedCommentId=13890651&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13890651
> [~hlassiege] says:
> I'm currently using this jms source to connect on Weblogic message bus. I'm 
> wondering why the JMSMessageConsumer use createQueue and createTopic instead 
> of lookup to find the destinations (line 83 to 90). 
> It seems that "createQueue" or "createTopic" are not the recommended way 
> because it is not portable (I saw that warning in Weblogic documentation even 
> if I can't justify this assertion). 
> The documentation recommends to use a JNDI lookup 
> (http://docs.oracle.com/cd/E23943_01/web./e13727/lookup.htm#BABDFCIC).
> Is there any reason to use createQueue instead of lookup ?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (FLUME-2311) Use standard way of finding queue/topic

2014-02-04 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13890901#comment-13890901
 ] 

Brock Noland commented on FLUME-2311:
-

[~hlassiege], would you be in a position to make the change and see if it fixes 
your issues?

> Use standard way of finding queue/topic
> ---
>
> Key: FLUME-2311
> URL: https://issues.apache.org/jira/browse/FLUME-2311
> Project: Flume
>  Issue Type: Bug
>Reporter: Brock Noland
>
> Here 
> https://issues.apache.org/jira/browse/FLUME-924?focusedCommentId=13890651&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13890651
> [~hlassiege] says:
> I'm currently using this jms source to connect on Weblogic message bus. I'm 
> wondering why the JMSMessageConsumer use createQueue and createTopic instead 
> of lookup to find the destinations (line 83 to 90). 
> It seems that "createQueue" or "createTopic" are not the recommended way 
> because it is not portable (I saw that warning in Weblogic documentation even 
> if I can't justify this assertion). 
> The documentation recommends to use a JNDI lookup 
> (http://docs.oracle.com/cd/E23943_01/web./e13727/lookup.htm#BABDFCIC).
> Is there any reason to use createQueue instead of lookup ?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (FLUME-2311) Use standard way of finding queue/topic

2014-02-04 Thread Brock Noland (JIRA)

Brock Noland created FLUME-2311:
---

 Summary: Use standard way of finding queue/topic
 Key: FLUME-2311
 URL: https://issues.apache.org/jira/browse/FLUME-2311
 Project: Flume
  Issue Type: Bug
Reporter: Brock Noland


Here 
https://issues.apache.org/jira/browse/FLUME-924?focusedCommentId=13890651&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13890651

[~hlassiege] says:

I'm currently using this jms source to connect on Weblogic message bus. I'm 
wondering why the JMSMessageConsumer use createQueue and createTopic instead of 
lookup to find the destinations (line 83 to 90). 
It seems that "createQueue" or "createTopic" are not the recommended way 
because it is not portable (I saw that warning in Weblogic documentation even 
if I can't justify this assertion). 
The documentation recommends to use a JNDI lookup 
(http://docs.oracle.com/cd/E23943_01/web./e13727/lookup.htm#BABDFCIC).
Is there any reason to use createQueue instead of lookup ?




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (FLUME-924) Implement a JMS source for Flume NG

2014-02-04 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13890899#comment-13890899
 ] 

Brock Noland commented on FLUME-924:


Hi,

There is no particular reason...I created FLUME-2311 for your suggestion. I 
will comment there.

> Implement a JMS source for Flume NG
> ---
>
> Key: FLUME-924
> URL: https://issues.apache.org/jira/browse/FLUME-924
> Project: Flume
>  Issue Type: New Feature
>Affects Versions: v1.0.0
>Reporter: Bruno Mahé
>Assignee: Brock Noland
> Fix For: v1.4.0
>
> Attachments: FLUME-924-1.patch, FLUME-924-2.patch, FLUME-924-3.patch, 
> FLUME-924-5.patch, FLUME-924-6.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (FLUME-2181) Optionally disable File Channel fsyncs

2014-01-28 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884367#comment-13884367
 ] 

Brock Noland commented on FLUME-2181:
-

bq. The first one you mentioned is what I intended

Yep makes sense

bq. The last one is a bug, we should throw an exception only if 
fsyncPerTransaction is true.

I think in the case if operation != OP_RECORD we always want to throw. That is 
just bad data in both cases.

> Optionally disable File Channel fsyncs 
> ---
>
> Key: FLUME-2181
> URL: https://issues.apache.org/jira/browse/FLUME-2181
> Project: Flume
>  Issue Type: Improvement
>Reporter: Hari Shreedharan
>Assignee: Hari Shreedharan
> Attachments: FLUME-2181-1.patch, FLUME-2181.patch
>
>
> This will give File Channel performance a big boost, at the cost of possible 
> data loss if a crash happens between checkpoints. 
> Also we should make it configurable, with default to false. If the user does 
> not mind slight inconsistencies, this feature can be explicitly enabled 
> through configuration. So if it is not configured, then the behavior will be 
> exactly as it is now.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Resolved] (FLUME-1503) TestFileChannel needs to write to tmp folder inside target directory

2014-01-27 Thread Brock Noland (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLUME-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland resolved FLUME-1503.
-

Resolution: Won't Fix

Thank you Ashish! Since no one else has reported let's close for now.

> TestFileChannel needs to write to tmp folder inside target directory
> 
>
> Key: FLUME-1503
> URL: https://issues.apache.org/jira/browse/FLUME-1503
> Project: Flume
>  Issue Type: Bug
>  Components: Channel
>Affects Versions: v1.3.0
>Reporter: Mubarak Seyed
>Assignee: Ashish Paliwal
>
> It appears from test that TestFileChannel fails to create files in /tmp 
> directory due to permission issues in Mac.
> Can we create files in target directory?
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.12:test (default-test) on 
> project flume-file-channel: Failure or timeout -> [Help 1]
> org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute 
> goal org.apache.maven.plugins:maven-surefire-plugin:2.12:test (default-test) 
> on project flume-file-channel: Failure or timeout
>   at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:217)
>   at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
>   at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
>   at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:84)
>   at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:59)
>   at 
> org.apache.maven.lifecycle.internal.LifecycleStarter.singleThreadedBuild(LifecycleStarter.java:183)
>   at 
> org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:161)
>   at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:319)
>   at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:156)
>   at org.apache.maven.cli.MavenCli.execute(MavenCli.java:537)
>   at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:196)
>   at org.apache.maven.cli.MavenCli.main(MavenCli.java:141)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:290)
>   at 
> org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:230)
>   at 
> org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:409)
>   at 
> org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:352)
> Caused by: org.apache.maven.plugin.MojoExecutionException: Failure or timeout
>   at 
> org.apache.maven.plugin.surefire.SurefirePlugin.assertNoFailureOrTimeout(SurefirePlugin.java:665)
>   at 
> org.apache.maven.plugin.surefire.SurefirePlugin.handleSummary(SurefirePlugin.java:646)
>   at 
> org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:137)
>   at 
> org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:98)
>   at 
> org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:101)
>   at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:209)
>   ... 19 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (FLUME-2181) Optionally disable File Channel fsyncs

2014-01-24 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13881421#comment-13881421
 ] 

Brock Noland commented on FLUME-2181:
-

Thank you very much of the new exception types!  Can you make a RB item next 
update?

I think this is wrong. I think we don't want to throw an IOException when 
fsyncPerTransaction == true. We should also have a test for this particular 
condition.

{noformat}
-  open = false;
-  throw new IOException("Corrupt event found. Please run File Channel " +
-"Integrity tool.", ex);
+  if (fsyncPerTransaction) {
+open = false;
+throw new IOException("Corrupt event found. Please run File Channel " +
+  "Integrity tool.", ex);
+  }
+  throw ex;
{noformat}

I think the below has to catch throwable because scheduled executor does and 
then eats it.
{noformat}
+try {
+  sync();
+} catch (Exception ex) {
+  LOG.error("Data file, " + getFile().toString() + " could not " +
+"be synced to disk due to an error.", ex);
+}
{noformat}

if(LOG.isDebugEnabled()) can be added here:
{noformat}
+LOG.debug("No events written to file, " + getFile().toString() +
+  " in last " + fsyncInterval + " or since last commit.");
{noformat}

Precondition.checkNotNull
{noformat}
+  syncExecutor.shutdown(); // No need to wait for it to shutdown.
{noformat}

is this really what we want? I think we want to throw an exception regardless 
of fsyncPerTransaction.
{noformat}
+if(operation != OP_RECORD) {
+  if (!fsyncPerTransaction) {
+throw new CorruptEventException("Operation code is invalid. File " 
+
+  "is corrupt. Please run File Channel Integrity tool.");
+  }
+}
{noformat}

> Optionally disable File Channel fsyncs 
> ---
>
> Key: FLUME-2181
> URL: https://issues.apache.org/jira/browse/FLUME-2181
> Project: Flume
>  Issue Type: Improvement
>Reporter: Hari Shreedharan
>Assignee: Hari Shreedharan
> Attachments: FLUME-2181-1.patch, FLUME-2181.patch
>
>
> This will give File Channel performance a big boost, at the cost of possible 
> data loss if a crash happens between checkpoints. 
> Also we should make it configurable, with default to false. If the user does 
> not mind slight inconsistencies, this feature can be explicitly enabled 
> through configuration. So if it is not configured, then the behavior will be 
> exactly as it is now.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (FLUME-2095) JMS source with TIBCO

2014-01-16 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873747#comment-13873747
 ] 

Brock Noland commented on FLUME-2095:
-

bq. I have tried several combinations in the parameter file without success. 
Has anyone gotten this to work too?

Can you share the configs you tried? AFAIK the JMS source supports username and 
password properties.

> JMS source with TIBCO
> -
>
> Key: FLUME-2095
> URL: https://issues.apache.org/jira/browse/FLUME-2095
> Project: Flume
>  Issue Type: Question
>  Components: Sinks+Sources
>Affects Versions: v1.3.1
> Environment: Windows 7
>Reporter: Bhaskar Reddy
>Priority: Critical
>
> Hi,
> I was trying to use the JMS source to work with TIBCO, but I am encountering 
> the exception below,
> org.apache.flume.FlumeException: Could not lookup ConnectionFactory
>   at org.apache.flume.source.jms.JMSSource.doConfigure(JMSSource.java:222)
>   at 
> org.apache.flume.source.BasicSourceSemantics.configure(BasicSourceSemantics.java:65)
>   at org.apache.flume.conf.Configurables.configure(Configurables.java:41)
>   at 
> org.apache.flume.node.AbstractConfigurationProvider.loadSources(AbstractConfigurationProvider.java:331)
>   at 
> org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:102)
>   at 
> org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>   at 
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>   at java.lang.Thread.run(Thread.java:662)
> Caused by: javax.naming.NameNotFoundException: Name not found: 
> 'com.tibco.tibjms.TibjmsQueueConnectionFactory'
>   at com.tibco.tibjms.naming.TibjmsContext.lookup(TibjmsContext.java:715)
>   at com.tibco.tibjms.naming.TibjmsContext.lookup(TibjmsContext.java:491)
>   at javax.naming.InitialContext.lookup(InitialContext.java:392)
>   at org.apache.flume.source.jms.JMSSource.doConfigure(JMSSource.java:219)
>   ... 14 more
> Please find my configuration below,
> a1.sources.r1.type = jms
> a1.sources.r1.channels = c1
> a1.sources.r1.initialContextFactory = 
> com.tibco.tibjms.naming.TibjmsInitialContextFactory
> a1.sources.r1.connectionFactory = 
> com.tibco.tibjms.TibjmsQueueConnectionFactory
> a1.sources.r1.providerURL = tibjmsnaming://localhost:7222
> a1.sources.r1.destinationName = sample
> a1.sources.r1.destinationType = QUEUE
> I tried changing the configuration below,
> a1.sources.r1.type = jms
> a1.sources.r1.channels = c1
> a1.sources.r1.initialContextFactory = 
> com.tibco.tibjms.naming.TibjmsInitialContextFactory
> a1.sources.r1.providerURL = tcp://localhost:7222
> a1.sources.r1.destinationName = sample
> a1.sources.r1.destinationType = QUEUE
> Thanks,
> Bhaskar.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (FLUME-2277) Improve FileChannel documentation to address commons support issues

2014-01-09 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13866691#comment-13866691
 ] 

Brock Noland commented on FLUME-2277:
-

Thanks Roshan! Should we commit this and then add the performance items in a 
follow on?

> Improve FileChannel documentation to address commons support issues
> ---
>
> Key: FLUME-2277
> URL: https://issues.apache.org/jira/browse/FLUME-2277
> Project: Flume
>  Issue Type: Task
>Reporter: Brock Noland
>Assignee: Brock Noland
> Attachments: FLUME-2277.patch
>
>
> Often users configure too small of batch size with File Channel, use sources 
> such as Exec source which generate small batches, or do not configure 
> multiple disks.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (FLUME-2277) Improve FileChannel documentation to address commons support issues

2013-12-30 Thread Brock Noland (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLUME-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated FLUME-2277:


Summary: Improve FileChannel documentation to address commons support 
issues  (was: Improve FileChannel documentation to address commons upport 
issues)

> Improve FileChannel documentation to address commons support issues
> ---
>
> Key: FLUME-2277
> URL: https://issues.apache.org/jira/browse/FLUME-2277
> Project: Flume
>  Issue Type: Task
>Reporter: Brock Noland
>Assignee: Brock Noland
> Attachments: FLUME-2277.patch
>
>
> Often users configure too small of batch size with File Channel, use sources 
> such as Exec source which generate small batches, or do not configure 
> multiple disks.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (FLUME-2285) FileChannel Erroneously Reports "Usable Space Exhausted"

2013-12-30 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13858882#comment-13858882
 ] 

Brock Noland commented on FLUME-2285:
-

bq. The drive that the checkpoint and data directories reside on has 86GB 
available (unused) on it. The flume application is calculating the wrong number 
for usable space.

Flume is using the standard java apis to get this information. Are you sure 
that flume is actually storing it's logs on the disk with free space?

bq. If I am wrong here, then the exception message should at least be more 
clear, 

I don't disagree, but it is *much* better than if we didn't have this check :) 
without the check users got one of half dozen runtime errors.

bq. describing how to fix the issue, and where exactly it is getting that 
number (99401728 bytes). It would also be nice if the numbers were reported in 
human readable form (MB, GB, etc.)

That would be cool! The line of code requiring a change is here:

https://github.com/apache/flume/blob/trunk/flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/Log.java#L610


> FileChannel Erroneously Reports "Usable Space Exhausted"
> 
>
> Key: FLUME-2285
> URL: https://issues.apache.org/jira/browse/FLUME-2285
> Project: Flume
>  Issue Type: Bug
>Affects Versions: v1.4.0
>Reporter: Michael Knapp
>
> I am using Flume 1.4.0, my configuration has a file channel with all of the 
> default settings.  It's checkpoint directory and data directory are both 
> empty and have the correct permissions on them.  When I run flume I get this 
> exception:
> java.lang.IllegalStateException: Channel closed [...]. Due to 
> java.io.IOException: Usable space exhaused, only 99401728 bytes remaining, 
> required 524288000 bytes
> The drive that the checkpoint and data directories reside on has 86GB 
> available (unused) on it.  The flume application is calculating the wrong 
> number for usable space.
> If I am wrong here, then the exception message should at least be more clear, 
> describing how to fix the issue, and where exactly it is getting that number 
> (99401728 bytes).  It would also be nice if the numbers were reported in 
> human readable form (MB, GB, etc.)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2013-12-26 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13857070#comment-13857070
 ] 

Brock Noland commented on FLUME-1227:
-

Thank you for addressing the feedback!  I am OK with your reasoning regarding 
adding dual checkpointing to the example. I haven't looked at this code and 
review in detail. It looks like Hari has, so I think he'll have to make the 
call of when to commit.

Thank you for your hard work Roshan!

> Introduce some sort of SpillableChannel
> ---
>
> Key: FLUME-1227
> URL: https://issues.apache.org/jira/browse/FLUME-1227
> Project: Flume
>  Issue Type: New Feature
>  Components: Channel
>Reporter: Jarek Jarcec Cecho
>Assignee: Roshan Naik
> Attachments: 1227.patch.1, FLUME-1227.v2.patch, FLUME-1227.v5.patch, 
> FLUME-1227.v6.patch, FLUME-1227.v7.patch, FLUME-1227.v8.patch, 
> FLUME-1227.v9.patch, SpillableMemory Channel Design 2.pdf, SpillableMemory 
> Channel Design.pdf
>
>
> I would like to introduce new channel that would behave similarly as scribe 
> (https://github.com/facebook/scribe). It would be something between memory 
> and file channel. Input events would be saved directly to the memory (only) 
> and would be served from there. In case that the memory would be full, we 
> would outsource the events to file.
> Let me describe the use case behind this request. We have plenty of frontend 
> servers that are generating events. We want to send all events to just 
> limited number of machines from where we would send the data to HDFS (some 
> sort of staging layer). Reason for this second layer is our need to decouple 
> event aggregation and front end code to separate machines. Using memory 
> channel is fully sufficient as we can survive lost of some portion of the 
> events. However in order to sustain maintenance windows or networking issues 
> we would have to end up with a lot of memory assigned to those "staging" 
> machines. Referenced "scribe" is dealing with this problem by implementing 
> following logic - events are saved in memory similarly as our MemoryChannel. 
> However in case that the memory gets full (because of maintenance, networking 
> issues, ...) it will spill data to disk where they will be sitting until 
> everything start working again.
> I would like to introduce channel that would implement similar logic. It's 
> durability guarantees would be same as MemoryChannel - in case that someone 
> would remove power cord, this channel would lose data. Based on the 
> discussion in FLUME-1201, I would propose to have the implementation 
> completely independent on any other channel internal code.
> Jarcec



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (FLUME-1491) Dynamic configuration from Zookeeper watcher

2013-12-19 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13852921#comment-13852921
 ] 

Brock Noland commented on FLUME-1491:
-

Hi Ashish,

Great work!! Thank you very much for contributing to Flume! I have two concerns 
with the current approach:

1) I think the big use-case for pulling configuration from ZK would be 
automatic reconfiguration but this patch doesn't implement that.
2) The patch stores the entire contents of the file in a single znode which has 
a 1MB size limit by default.

> Dynamic configuration from Zookeeper watcher
> 
>
> Key: FLUME-1491
> URL: https://issues.apache.org/jira/browse/FLUME-1491
> Project: Flume
>  Issue Type: Improvement
>  Components: Configuration
>Affects Versions: v1.2.0
>Reporter: Denny Ye
>Assignee: Ashish Paliwal
>  Labels: Zookeeper
> Attachments: FLUME-1491-2.patch, FLUME-1491-3.patch, 
> FLUME-1491-4.patch, FLUME-1491-5.patch
>
>
> Currently, Flume only support file-level dynamic configuration. Another 
> frequent usage in practical environment, we would like to manage 
> configuration with Zookeeper, and modify configuration from Web UI to stored 
> file in Zookeeper. 
> Flume should support this method with Zookeeper watcher.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (FLUME-2277) Improve FileChannel documentation to address commons upport issues

2013-12-18 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13851959#comment-13851959
 ] 

Brock Noland commented on FLUME-2277:
-

[~hshreedharan], attached is a doc update with some small code changes

> Improve FileChannel documentation to address commons upport issues
> --
>
> Key: FLUME-2277
> URL: https://issues.apache.org/jira/browse/FLUME-2277
> Project: Flume
>  Issue Type: Task
>Reporter: Brock Noland
>Assignee: Brock Noland
> Attachments: FLUME-2277.patch
>
>
> Often users configure too small of batch size with File Channel, use sources 
> such as Exec source which generate small batches, or do not configure 
> multiple disks.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (FLUME-2277) Improve FileChannel documentation to address commons upport issues

2013-12-18 Thread Brock Noland (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLUME-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated FLUME-2277:


Attachment: FLUME-2277.patch

> Improve FileChannel documentation to address commons upport issues
> --
>
> Key: FLUME-2277
> URL: https://issues.apache.org/jira/browse/FLUME-2277
> Project: Flume
>  Issue Type: Task
>Reporter: Brock Noland
>Assignee: Brock Noland
> Attachments: FLUME-2277.patch
>
>
> Often users configure too small of batch size with File Channel, use sources 
> such as Exec source which generate small batches, or do not configure 
> multiple disks.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Created] (FLUME-2277) Improve FileChannel documentation to address commons upport issues

2013-12-18 Thread Brock Noland (JIRA)

Brock Noland created FLUME-2277:
---

 Summary: Improve FileChannel documentation to address commons 
upport issues
 Key: FLUME-2277
 URL: https://issues.apache.org/jira/browse/FLUME-2277
 Project: Flume
  Issue Type: Task
Reporter: Brock Noland
Assignee: Brock Noland


Often users configure too small of batch size with File Channel, use sources 
such as Exec source which generate small batches, or do not configure multiple 
disks.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2013-12-18 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13851768#comment-13851768
 ] 

Brock Noland commented on FLUME-1227:
-

Hey, I have not participated in the review til now so sorry about this...but I 
just noticed the following items which are mostly "nits" and improvements.

SpillableMemoryChannel
1. Static stuff should be at the top
2. Constructor should be directly below fields
3. String constants should be static final fields with javadoc description
4. Stuff can be final:
{noformat}
private Object queueLock = new Object();
{noformat}

TestSpillableMemoryChannel
1. Take null has a commented out assertion
2. There are locations where we expect "Exception" that should be a specific 
type of exception.
3. Let's not use e.printStackTrace();
4. Places we assert boolean should have a message
5. Many missing spaces such as:
{noformat}
for (int i=0; i Introduce some sort of SpillableChannel
> ---
>
> Key: FLUME-1227
> URL: https://issues.apache.org/jira/browse/FLUME-1227
> Project: Flume
>  Issue Type: New Feature
>  Components: Channel
>Reporter: Jarek Jarcec Cecho
>Assignee: Roshan Naik
> Attachments: 1227.patch.1, FLUME-1227.v2.patch, FLUME-1227.v5.patch, 
> FLUME-1227.v6.patch, FLUME-1227.v7.patch, FLUME-1227.v8.patch, 
> SpillableMemory Channel Design 2.pdf, SpillableMemory Channel Design.pdf
>
>
> I would like to introduce new channel that would behave similarly as scribe 
> (https://github.com/facebook/scribe). It would be something between memory 
> and file channel. Input events would be saved directly to the memory (only) 
> and would be served from there. In case that the memory would be full, we 
> would outsource the events to file.
> Let me describe the use case behind this request. We have plenty of frontend 
> servers that are generating events. We want to send all events to just 
> limited number of machines from where we would send the data to HDFS (some 
> sort of staging layer). Reason for this second layer is our need to decouple 
> event aggregation and front end code to separate machines. Using memory 
> channel is fully sufficient as we can survive lost of some portion of the 
> events. However in order to sustain maintenance windows or networking issues 
> we would have to end up with a lot of memory assigned to those "staging" 
> machines. Referenced "scribe" is dealing with this problem by implementing 
> following logic - events are saved in memory similarly as our MemoryChannel. 
> However in case that the memory gets full (because of maintenance, networking 
> issues, ...) it will spill data to disk where they will be sitting until 
> everything start working again.
> I would like to introduce channel that would implement similar logic. It's 
> durability guarantees would be same as MemoryChannel - in case that someone 
> would remove power cord, this channel would lose data. Based on the 
> discussion in FLUME-1201, I would propose to have the implementation 
> completely independent on any other channel internal code.
> Jarcec



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (FLUME-2181) Optionally disable File Channel fsyncs

2013-12-17 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850563#comment-13850563
 ] 

Brock Noland commented on FLUME-2181:
-

bq.  ignoring bad data essentially means ignoring the rest of the file

Agreed...but when you turn this option on you agree that losing data is 
acceptable. However, once we have the new format, I think we should go back and 
add a seekNext() event which can be called by sequential reader in the case of 
bad data. Your thoughts?

> Optionally disable File Channel fsyncs 
> ---
>
> Key: FLUME-2181
> URL: https://issues.apache.org/jira/browse/FLUME-2181
> Project: Flume
>  Issue Type: Improvement
>Reporter: Hari Shreedharan
>Assignee: Hari Shreedharan
> Attachments: FLUME-2181.patch
>
>
> This will give File Channel performance a big boost, at the cost of possible 
> data loss if a crash happens between checkpoints. 
> Also we should make it configurable, with default to false. If the user does 
> not mind slight inconsistencies, this feature can be explicitly enabled 
> through configuration. So if it is not configured, then the behavior will be 
> exactly as it is now.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (FLUME-2181) Optionally disable File Channel fsyncs

2013-12-16 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850079#comment-13850079
 ] 

Brock Noland commented on FLUME-2181:
-

Ideally I'd only like to implement this if we implement the new format since it 
was have built in checksum.

In the case of disabling fsyncs I think both take() of a bad event and 
sequential reader should just ignore bad data.

> Optionally disable File Channel fsyncs 
> ---
>
> Key: FLUME-2181
> URL: https://issues.apache.org/jira/browse/FLUME-2181
> Project: Flume
>  Issue Type: Improvement
>Reporter: Hari Shreedharan
>Assignee: Hari Shreedharan
> Attachments: FLUME-2181.patch
>
>
> This will give File Channel performance a big boost, at the cost of possible 
> data loss if a crash happens between checkpoints. 
> Also we should make it configurable, with default to false. If the user does 
> not mind slight inconsistencies, this feature can be explicitly enabled 
> through configuration. So if it is not configured, then the behavior will be 
> exactly as it is now.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (FLUME-2155) Improve replay time

2013-12-15 Thread Brock Noland (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLUME-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated FLUME-2155:


Fix Version/s: v1.5.0

> Improve replay time
> ---
>
> Key: FLUME-2155
> URL: https://issues.apache.org/jira/browse/FLUME-2155
> Project: Flume
>  Issue Type: Improvement
>Reporter: Hari Shreedharan
>Assignee: Brock Noland
> Fix For: v1.5.0
>
> Attachments: 1-2, 10-11, 30-31, 
> 70-71, FLUME-2155-initial.patch, FLUME-2155.2.patch, 
> FLUME-2155.4.patch, FLUME-2155.5.patch, FLUME-2155.patch, 
> FLUME-FC-SLOW-REPLAY-1.patch, FLUME-FC-SLOW-REPLAY-FIX-1.patch, 
> SmartReplay.pdf, SmartReplay1.1.pdf, fc-test.patch
>
>
> File Channel has scaled so well that people now run channels with sizes in 
> 100's of millions of events. Turns out, replay can be crazy slow even between 
> checkpoints at this scale - because of the remove() method in FlumeEventQueue 
> moving every pointer that follows the one being removed (1 remove causes 99 
> million+ moves for a channel of 100 million!). There are several ways of 
> improving - one being move at the end of replay - sort of like a compaction. 
> Another is to use the fact that all removes happen from the top of the queue, 
> so move the first "k" events out to hashset and remove from there - we can 
> find k using the write id of the last checkpoint and the current one. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Resolved] (FLUME-2118) Occasional multi-hour pauses in file channel replay

2013-12-15 Thread Brock Noland (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLUME-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland resolved FLUME-2118.
-

Resolution: Duplicate

This is a duplicate of FLUME-2155 which is now resolved.

> Occasional multi-hour pauses in file channel replay
> ---
>
> Key: FLUME-2118
> URL: https://issues.apache.org/jira/browse/FLUME-2118
> Project: Flume
>  Issue Type: Bug
>  Components: File Channel
>Affects Versions: v1.5.0
>Reporter: Juhani Connolly
> Attachments: flume-log, flume-thread-dump, gc-flume.log.20130702
>
>
> Sometimes during replay, immediately after an EOF of one log, the replay will 
> pause for a long time.
> Here are two samples from this morning when we restarted our 3 aggregators 
> and 2 of them hit this issue.
> 02 7 2013 03:06:30,089 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 220 
> records
> 02 7 2013 03:06:30,179 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 221 
> records
> 02 7 2013 03:06:30,241 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.LogFile$SequentialReader.next:505)  - 
> Encountered EOF at 1623195625 in /data2/flume-data/log-1184
> 02 7 2013 06:23:27,629 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 222 
> records
> 02 7 2013 06:23:28,641 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 223 
> records
> 02 7 2013 06:23:29,162 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 224 
> records
> 02 7 2013 06:23:30,118 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 225 
> records
> 02 7 2013 06:23:30,750 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 226 
> records
> 02 7 2013 08:03:00,942 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 216 
> records
> 02 7 2013 08:03:01,055 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 217 
> records
> 02 7 2013 08:03:01,168 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 218 
> records
> 02 7 2013 08:03:01,181 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.LogFile$SequentialReader.next:505)  - 
> Encountered EOF at 1623195640 in /data2/flume-data/log-1182
> 02 7 2013 14:45:55,302 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 219 
> records
> 02 7 2013 14:45:56,282 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 220 
> records
> 02 7 2013 14:45:57,084 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 221 
> records
> 02 7 2013 14:45:59,043 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 222 
> records
> I've tried for an hour and some to track down the cause of this. There's 
> nothing suspicious turning up on ganglia, and a cursory review of the code 
> didn't turn up anything overly suspicious. Owing to time limitations I can't 
> dig further at this time.
> We run a version of flume from somewhat before the current 1.4 release 
> candidate(hash is eefefa941a60c0982f0957804be0cafb4d83e46e) there doesn't 
> appear to be any replay patches since then.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (FLUME-2268) Increase default transactionCapacity for MemoryChannel from 100 to 10000

2013-12-14 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13848385#comment-13848385
 ] 

Brock Noland commented on FLUME-2268:
-

I agree that we should remove that check from file channel and memory channel. 
It's a huge pain for our users.

> Increase default transactionCapacity for MemoryChannel from 100 to 1
> 
>
> Key: FLUME-2268
> URL: https://issues.apache.org/jira/browse/FLUME-2268
> Project: Flume
>  Issue Type: Improvement
>  Components: Channel
>Affects Versions: v1.4.0
>Reporter: Udai Kiran Potluri
>Assignee: Udai Kiran Potluri
>Priority: Minor
> Attachments: FLUME-2268-0.patch
>
>
> The current MemoryChannel default value is 100. Increasing it to 1 would 
> be useful.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (FLUME-2271) Log4j appender source can cause NullPointerException even in Unsafe mode

2013-12-13 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13848223#comment-13848223
 ] 

Brock Noland commented on FLUME-2271:
-

+1

> Log4j appender source can cause NullPointerException even in Unsafe mode
> 
>
> Key: FLUME-2271
> URL: https://issues.apache.org/jira/browse/FLUME-2271
> Project: Flume
>  Issue Type: Bug
>  Components: Client SDK
>Affects Versions: v1.4.0
>Reporter: Mubashir Kazia
> Attachments: FLUME-2271.patch
>
>
> If a client program is configured to use Log4J appender in unsafe mode and if 
> the source was available when the client program started but became 
> unavailable afterwards, Log4J appender can cause a NullPointerException. The 
> stack trace is as follows.
> 2013-12-11 15:20:36,619 ERROR [STDERR] (Thread-23) Exception in thread 
> "Thread-23"
> 2013-12-11 15:20:36,620 ERROR [STDERR] (Thread-23) 
> java.lang.NullPointerException
> 2013-12-11 15:20:36,620 ERROR [STDERR] (Thread-23)  at 
> org.apache.flume.clients.log4jappender.Log4jAppender.append(Log4jAppender.java:163)
> 2013-12-11 15:20:36,620 ERROR [STDERR] (Thread-23)  at 
> org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:230)
> 2013-12-11 15:20:36,620 ERROR [STDERR] (Thread-23)  at 
> org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:65)
> 2013-12-11 15:20:36,620 ERROR [STDERR] (Thread-23)  at 
> org.apache.log4j.Category.callAppenders(Category.java:203)
> 2013-12-11 15:20:36,620 ERROR [STDERR] (Thread-23)  at 
> org.apache.log4j.Category.forcedLog(Category.java:388)
> 2013-12-11 15:20:36,620 ERROR [STDERR] (Thread-23)  at 
> org.apache.log4j.Category.log(Category.java:853)
> I have developed a fix that'll will address this problem. I'll attach the 
> patch shortly.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (FLUME-2155) Improve replay time

2013-12-13 Thread Brock Noland (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLUME-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated FLUME-2155:


Attachment: FLUME-2155.5.patch

> Improve replay time
> ---
>
> Key: FLUME-2155
> URL: https://issues.apache.org/jira/browse/FLUME-2155
> Project: Flume
>  Issue Type: Improvement
>Reporter: Hari Shreedharan
>Assignee: Brock Noland
> Attachments: 1-2, 10-11, 30-31, 
> 70-71, FLUME-2155-initial.patch, FLUME-2155.2.patch, 
> FLUME-2155.4.patch, FLUME-2155.5.patch, FLUME-2155.patch, 
> FLUME-FC-SLOW-REPLAY-1.patch, FLUME-FC-SLOW-REPLAY-FIX-1.patch, 
> SmartReplay.pdf, SmartReplay1.1.pdf, fc-test.patch
>
>
> File Channel has scaled so well that people now run channels with sizes in 
> 100's of millions of events. Turns out, replay can be crazy slow even between 
> checkpoints at this scale - because of the remove() method in FlumeEventQueue 
> moving every pointer that follows the one being removed (1 remove causes 99 
> million+ moves for a channel of 100 million!). There are several ways of 
> improving - one being move at the end of replay - sort of like a compaction. 
> Another is to use the fact that all removes happen from the top of the queue, 
> so move the first "k" events out to hashset and remove from there - we can 
> find k using the write id of the last checkpoint and the current one. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (FLUME-1860) Remove transaction capacity from Memory and File channels

2013-12-12 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13846941#comment-13846941
 ] 

Brock Noland commented on FLUME-1860:
-

Seems like we had an agreement on on setting the default to math.min(channel 
capacity, 1)...

This breaks many users so unless there is an objection I will submit a patch. 

> Remove transaction capacity from Memory and File channels
> -
>
> Key: FLUME-1860
> URL: https://issues.apache.org/jira/browse/FLUME-1860
> Project: Flume
>  Issue Type: Bug
>Reporter: Hari Shreedharan
>
> Transaction Capacity was primarily meant to be a memory safeguard. It ensures 
> that we don't have queues which are massive and can cause OOMs. I wonder if 
> there is a way of fixing this and making sure a malicious RpcClient cannot 
> cause OOMs by sending batches of huge sizes



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (FLUME-2155) Improve replay time

2013-12-10 Thread Brock Noland (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLUME-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated FLUME-2155:


Attachment: FLUME-2155.4.patch

> Improve replay time
> ---
>
> Key: FLUME-2155
> URL: https://issues.apache.org/jira/browse/FLUME-2155
> Project: Flume
>  Issue Type: Improvement
>Reporter: Hari Shreedharan
>Assignee: Brock Noland
> Attachments: 1-2, 10-11, 30-31, 
> 70-71, FLUME-2155-initial.patch, FLUME-2155.2.patch, 
> FLUME-2155.4.patch, FLUME-2155.patch, FLUME-FC-SLOW-REPLAY-1.patch, 
> FLUME-FC-SLOW-REPLAY-FIX-1.patch, SmartReplay.pdf, SmartReplay1.1.pdf, 
> fc-test.patch
>
>
> File Channel has scaled so well that people now run channels with sizes in 
> 100's of millions of events. Turns out, replay can be crazy slow even between 
> checkpoints at this scale - because of the remove() method in FlumeEventQueue 
> moving every pointer that follows the one being removed (1 remove causes 99 
> million+ moves for a channel of 100 million!). There are several ways of 
> improving - one being move at the end of replay - sort of like a compaction. 
> Another is to use the fact that all removes happen from the top of the queue, 
> so move the first "k" events out to hashset and remove from there - we can 
> find k using the write id of the last checkpoint and the current one. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Comment Edited] (FLUME-2264) Log4j Appender + Avro Reflection on string results in an invalid avro schema

2013-12-10 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844322#comment-13844322
 ] 

Brock Noland edited comment on FLUME-2264 at 12/10/13 2:53 PM:
---

 I think initially we should just add a note to not use AvroReflectionEnabled 
when the users will be logging strings.


was (Author: brocknoland):
 I think initially we should just add a note to note use AvroReflectionEnabled 
when the users will be logging strings.

> Log4j Appender + Avro Reflection on string results in an invalid avro schema
> 
>
> Key: FLUME-2264
> URL: https://issues.apache.org/jira/browse/FLUME-2264
> Project: Flume
>  Issue Type: Bug
>Reporter: Brock Noland
> Attachments: FLUME-2264.patch
>
>
> When a user turns on Avro Reflection via AvroReflectionEnabled, and the user 
> logs a string, the result is an invalid avro schema with "string" (including 
> quotes).
> Users do not expect and invalid avro schema.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (FLUME-2264) Log4j Appender + Avro Reflection on string results in an invalid avro schema

2013-12-10 Thread Brock Noland (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLUME-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated FLUME-2264:


Attachment: FLUME-2264.patch

 I think initially we should just add a note to note use AvroReflectionEnabled 
when the users will be logging strings.

> Log4j Appender + Avro Reflection on string results in an invalid avro schema
> 
>
> Key: FLUME-2264
> URL: https://issues.apache.org/jira/browse/FLUME-2264
> Project: Flume
>  Issue Type: Bug
>Reporter: Brock Noland
> Attachments: FLUME-2264.patch
>
>
> When a user turns on Avro Reflection via AvroReflectionEnabled, and the user 
> logs a string, the result is an invalid avro schema with "string" (including 
> quotes).
> Users do not expect and invalid avro schema.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (FLUME-2264) Log4j Appender + Avro Reflection on string results in an invalid avro schema

2013-12-10 Thread Brock Noland (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLUME-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated FLUME-2264:


Description: 
When a user turns on Avro Reflection via AvroReflectionEnabled, and the user 
logs a string, the result is an invalid avro schema with "string" (including 
quotes).

Users do not expect and invalid avro schema.

  was:
When a user turns on Avro Reflection via AvroReflectionEnabled, and the user 
logs a string, the result is an invalid avro schema with "string" (including 
quotes).

Users do not expect and invalid avro schema. I think initially we should just 
add a note to note use AvroReflectionEnabled when the users will be logging 
strings.


> Log4j Appender + Avro Reflection on string results in an invalid avro schema
> 
>
> Key: FLUME-2264
> URL: https://issues.apache.org/jira/browse/FLUME-2264
> Project: Flume
>  Issue Type: Bug
>Reporter: Brock Noland
> Attachments: FLUME-2264.patch
>
>
> When a user turns on Avro Reflection via AvroReflectionEnabled, and the user 
> logs a string, the result is an invalid avro schema with "string" (including 
> quotes).
> Users do not expect and invalid avro schema.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Created] (FLUME-2264) Log4j Appender + Avro Reflection on string results in an invalid avro schema

2013-12-10 Thread Brock Noland (JIRA)

Brock Noland created FLUME-2264:
---

 Summary: Log4j Appender + Avro Reflection on string results in an 
invalid avro schema
 Key: FLUME-2264
 URL: https://issues.apache.org/jira/browse/FLUME-2264
 Project: Flume
  Issue Type: Bug
Reporter: Brock Noland


When a user turns on Avro Reflection via AvroReflectionEnabled, and the user 
logs a string, the result is an invalid avro schema with "string" (including 
quotes).

Users do not expect and invalid avro schema. I think initially we should just 
add a note to note use AvroReflectionEnabled when the users will be logging 
strings.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (FLUME-2262) Log4j Appender should use timeStamp field not getTimestamp

2013-12-09 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843886#comment-13843886
 ] 

Brock Noland commented on FLUME-2262:
-

Thank you!

> Log4j Appender should use timeStamp field not getTimestamp
> --
>
> Key: FLUME-2262
> URL: https://issues.apache.org/jira/browse/FLUME-2262
> Project: Flume
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Brock Noland
> Attachments: FLUME-2262.patch
>
>
> getTimestamp was added in log4j 1.2.15, we should use the timestamp field 
> instead:
> 1.2.14: 
> https://github.com/apache/log4j/blob/v1_2_14/src/java/org/apache/log4j/spi/LoggingEvent.java#L124
> trunk: 
> https://github.com/apache/log4j/blob/trunk/src/main/java/org/apache/log4j/spi/LoggingEvent.java#L569



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (FLUME-2262) Log4j Appender should use timeStamp field not getTimestamp

2013-12-09 Thread Brock Noland (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLUME-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated FLUME-2262:


Description: 
getTimestamp was added in log4j 1.2.15, we should use the timestamp field 
instead:

1.2.14: 
https://github.com/apache/log4j/blob/v1_2_14/src/java/org/apache/log4j/spi/LoggingEvent.java#L124

trunk: 
https://github.com/apache/log4j/blob/trunk/src/main/java/org/apache/log4j/spi/LoggingEvent.java#L569

  was:
getTimestamp was added in log4j 1.15, we should use the timestamp field instead:

1.14: 
https://github.com/apache/log4j/blob/v1_2_14/src/java/org/apache/log4j/spi/LoggingEvent.java#L124

trunk: 
https://github.com/apache/log4j/blob/trunk/src/main/java/org/apache/log4j/spi/LoggingEvent.java#L569


> Log4j Appender should use timeStamp field not getTimestamp
> --
>
> Key: FLUME-2262
> URL: https://issues.apache.org/jira/browse/FLUME-2262
> Project: Flume
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Brock Noland
> Attachments: FLUME-2262.patch
>
>
> getTimestamp was added in log4j 1.2.15, we should use the timestamp field 
> instead:
> 1.2.14: 
> https://github.com/apache/log4j/blob/v1_2_14/src/java/org/apache/log4j/spi/LoggingEvent.java#L124
> trunk: 
> https://github.com/apache/log4j/blob/trunk/src/main/java/org/apache/log4j/spi/LoggingEvent.java#L569



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Assigned] (FLUME-2262) Log4j Appender should use timeStamp field not getTimestamp

2013-12-09 Thread Brock Noland (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLUME-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland reassigned FLUME-2262:
---

Assignee: Brock Noland

> Log4j Appender should use timeStamp field not getTimestamp
> --
>
> Key: FLUME-2262
> URL: https://issues.apache.org/jira/browse/FLUME-2262
> Project: Flume
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Brock Noland
> Attachments: FLUME-2262.patch
>
>
> getTimestamp was added in log4j 1.15, we should use the timestamp field 
> instead:
> 1.14: 
> https://github.com/apache/log4j/blob/v1_2_14/src/java/org/apache/log4j/spi/LoggingEvent.java#L124
> trunk: 
> https://github.com/apache/log4j/blob/trunk/src/main/java/org/apache/log4j/spi/LoggingEvent.java#L569



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (FLUME-2262) Log4j Appender should use timeStamp field not getTimestamp

2013-12-09 Thread Brock Noland (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLUME-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated FLUME-2262:


Attachment: FLUME-2262.patch

> Log4j Appender should use timeStamp field not getTimestamp
> --
>
> Key: FLUME-2262
> URL: https://issues.apache.org/jira/browse/FLUME-2262
> Project: Flume
>  Issue Type: Bug
>Reporter: Brock Noland
> Attachments: FLUME-2262.patch
>
>
> getTimestamp was added in log4j 1.15, we should use the timestamp field 
> instead:
> 1.14: 
> https://github.com/apache/log4j/blob/v1_2_14/src/java/org/apache/log4j/spi/LoggingEvent.java#L124
> trunk: 
> https://github.com/apache/log4j/blob/trunk/src/main/java/org/apache/log4j/spi/LoggingEvent.java#L569



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Created] (FLUME-2262) Log4j Appender should use timeStamp field not getTimestamp

2013-12-09 Thread Brock Noland (JIRA)

Brock Noland created FLUME-2262:
---

 Summary: Log4j Appender should use timeStamp field not getTimestamp
 Key: FLUME-2262
 URL: https://issues.apache.org/jira/browse/FLUME-2262
 Project: Flume
  Issue Type: Bug
Reporter: Brock Noland
 Attachments: FLUME-2262.patch

getTimestamp was added in log4j 1.15, we should use the timestamp field instead:

1.14: 
https://github.com/apache/log4j/blob/v1_2_14/src/java/org/apache/log4j/spi/LoggingEvent.java#L124

trunk: 
https://github.com/apache/log4j/blob/trunk/src/main/java/org/apache/log4j/spi/LoggingEvent.java#L569



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (FLUME-2155) Improve replay time

2013-12-08 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13842557#comment-13842557
 ] 

Brock Noland commented on FLUME-2155:
-

RB item https://reviews.apache.org/r/16107/

> Improve replay time
> ---
>
> Key: FLUME-2155
> URL: https://issues.apache.org/jira/browse/FLUME-2155
> Project: Flume
>  Issue Type: Improvement
>Reporter: Hari Shreedharan
>Assignee: Brock Noland
> Attachments: 1-2, 10-11, 30-31, 
> 70-71, FLUME-2155-initial.patch, FLUME-2155.2.patch, 
> FLUME-2155.patch, FLUME-FC-SLOW-REPLAY-1.patch, 
> FLUME-FC-SLOW-REPLAY-FIX-1.patch, SmartReplay.pdf, SmartReplay1.1.pdf, 
> fc-test.patch
>
>
> File Channel has scaled so well that people now run channels with sizes in 
> 100's of millions of events. Turns out, replay can be crazy slow even between 
> checkpoints at this scale - because of the remove() method in FlumeEventQueue 
> moving every pointer that follows the one being removed (1 remove causes 99 
> million+ moves for a channel of 100 million!). There are several ways of 
> improving - one being move at the end of replay - sort of like a compaction. 
> Another is to use the fact that all removes happen from the top of the queue, 
> so move the first "k" events out to hashset and remove from there - we can 
> find k using the write id of the last checkpoint and the current one. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (FLUME-2155) Improve replay time

2013-12-08 Thread Brock Noland (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLUME-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated FLUME-2155:


Attachment: FLUME-2155.2.patch

> Improve replay time
> ---
>
> Key: FLUME-2155
> URL: https://issues.apache.org/jira/browse/FLUME-2155
> Project: Flume
>  Issue Type: Improvement
>Reporter: Hari Shreedharan
>Assignee: Brock Noland
> Attachments: 1-2, 10-11, 30-31, 
> 70-71, FLUME-2155-initial.patch, FLUME-2155.2.patch, 
> FLUME-2155.patch, FLUME-FC-SLOW-REPLAY-1.patch, 
> FLUME-FC-SLOW-REPLAY-FIX-1.patch, SmartReplay.pdf, SmartReplay1.1.pdf, 
> fc-test.patch
>
>
> File Channel has scaled so well that people now run channels with sizes in 
> 100's of millions of events. Turns out, replay can be crazy slow even between 
> checkpoints at this scale - because of the remove() method in FlumeEventQueue 
> moving every pointer that follows the one being removed (1 remove causes 99 
> million+ moves for a channel of 100 million!). There are several ways of 
> improving - one being move at the end of replay - sort of like a compaction. 
> Another is to use the fact that all removes happen from the top of the queue, 
> so move the first "k" events out to hashset and remove from there - we can 
> find k using the write id of the last checkpoint and the current one. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (FLUME-2155) Improve replay time

2013-12-06 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841565#comment-13841565
 ] 

Brock Noland commented on FLUME-2155:
-

Sounds good, I am updating it a bit as I think we only want to be adding items 
to the queueSet during replay.

> Improve replay time
> ---
>
> Key: FLUME-2155
> URL: https://issues.apache.org/jira/browse/FLUME-2155
> Project: Flume
>  Issue Type: Bug
>Reporter: Hari Shreedharan
>Assignee: Hari Shreedharan
> Attachments: 1-2, 10-11, 30-31, 
> 70-71, FLUME-2155-initial.patch, FLUME-2155.patch, 
> FLUME-FC-SLOW-REPLAY-1.patch, FLUME-FC-SLOW-REPLAY-FIX-1.patch, 
> SmartReplay.pdf, SmartReplay1.1.pdf, fc-test.patch
>
>
> File Channel has scaled so well that people now run channels with sizes in 
> 100's of millions of events. Turns out, replay can be crazy slow even between 
> checkpoints at this scale - because of the remove() method in FlumeEventQueue 
> moving every pointer that follows the one being removed (1 remove causes 99 
> million+ moves for a channel of 100 million!). There are several ways of 
> improving - one being move at the end of replay - sort of like a compaction. 
> Another is to use the fact that all removes happen from the top of the queue, 
> so move the first "k" events out to hashset and remove from there - we can 
> find k using the write id of the last checkpoint and the current one. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Comment Edited] (FLUME-2155) Improve replay time

2013-12-06 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841519#comment-13841519
 ] 

Brock Noland edited comment on FLUME-2155 at 12/6/13 6:29 PM:
--

Yes, we could use it to store indexes. However, I noted I was unable to find a 
time where something was actually removed and wasn't at index 0. So I think 
most real removes are very fast. Additionally I think we should limit mapdb's 
use for now. Once we are more comfortable with it we can expand it's use. For 
example the overwrite map would be a good place.


was (Author: brocknoland):
Yes, we could use it to store indexes. However, I noted I was unable to find a 
time where something was actually removed and wasn't at index 0. So I think 
real removes are very fast. Additionally I think we should limit mapdb's use 
for now. Once we are more comfortable with it we can expand it's use. For 
example the overwrite map would be a good place.

> Improve replay time
> ---
>
> Key: FLUME-2155
> URL: https://issues.apache.org/jira/browse/FLUME-2155
> Project: Flume
>  Issue Type: Bug
>Reporter: Hari Shreedharan
>Assignee: Hari Shreedharan
> Attachments: 1-2, 10-11, 30-31, 
> 70-71, FLUME-2155-initial.patch, FLUME-2155.patch, 
> FLUME-FC-SLOW-REPLAY-1.patch, FLUME-FC-SLOW-REPLAY-FIX-1.patch, 
> SmartReplay.pdf, SmartReplay1.1.pdf, fc-test.patch
>
>
> File Channel has scaled so well that people now run channels with sizes in 
> 100's of millions of events. Turns out, replay can be crazy slow even between 
> checkpoints at this scale - because of the remove() method in FlumeEventQueue 
> moving every pointer that follows the one being removed (1 remove causes 99 
> million+ moves for a channel of 100 million!). There are several ways of 
> improving - one being move at the end of replay - sort of like a compaction. 
> Another is to use the fact that all removes happen from the top of the queue, 
> so move the first "k" events out to hashset and remove from there - we can 
> find k using the write id of the last checkpoint and the current one. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (FLUME-2155) Improve replay time

2013-12-06 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841519#comment-13841519
 ] 

Brock Noland commented on FLUME-2155:
-

Yes, we could use it to store indexes. However, I noted I was unable to find a 
time where something was actually removed and wasn't at index 0. So I think 
real removes are very fast. Additionally I think we should limit mapdb's use 
for now. Once we are more comfortable with it we can expand it's use. For 
example the overwrite map would be a good place.

> Improve replay time
> ---
>
> Key: FLUME-2155
> URL: https://issues.apache.org/jira/browse/FLUME-2155
> Project: Flume
>  Issue Type: Bug
>Reporter: Hari Shreedharan
>Assignee: Hari Shreedharan
> Attachments: 1-2, 10-11, 30-31, 
> 70-71, FLUME-2155-initial.patch, FLUME-2155.patch, 
> FLUME-FC-SLOW-REPLAY-1.patch, FLUME-FC-SLOW-REPLAY-FIX-1.patch, 
> SmartReplay.pdf, SmartReplay1.1.pdf, fc-test.patch
>
>
> File Channel has scaled so well that people now run channels with sizes in 
> 100's of millions of events. Turns out, replay can be crazy slow even between 
> checkpoints at this scale - because of the remove() method in FlumeEventQueue 
> moving every pointer that follows the one being removed (1 remove causes 99 
> million+ moves for a channel of 100 million!). There are several ways of 
> improving - one being move at the end of replay - sort of like a compaction. 
> Another is to use the fact that all removes happen from the top of the queue, 
> so move the first "k" events out to hashset and remove from there - we can 
> find k using the write id of the last checkpoint and the current one. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (FLUME-2155) Improve replay time

2013-12-06 Thread Brock Noland (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLUME-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated FLUME-2155:


Attachment: FLUME-FC-SLOW-REPLAY-FIX-1.patch
FLUME-FC-SLOW-REPLAY-1.patch

> Improve replay time
> ---
>
> Key: FLUME-2155
> URL: https://issues.apache.org/jira/browse/FLUME-2155
> Project: Flume
>  Issue Type: Bug
>Reporter: Hari Shreedharan
>Assignee: Hari Shreedharan
> Attachments: 1-2, 10-11, 30-31, 
> 70-71, FLUME-2155-initial.patch, FLUME-2155.patch, 
> FLUME-FC-SLOW-REPLAY-1.patch, FLUME-FC-SLOW-REPLAY-FIX-1.patch, 
> SmartReplay.pdf, SmartReplay1.1.pdf, fc-test.patch
>
>
> File Channel has scaled so well that people now run channels with sizes in 
> 100's of millions of events. Turns out, replay can be crazy slow even between 
> checkpoints at this scale - because of the remove() method in FlumeEventQueue 
> moving every pointer that follows the one being removed (1 remove causes 99 
> million+ moves for a channel of 100 million!). There are several ways of 
> improving - one being move at the end of replay - sort of like a compaction. 
> Another is to use the fact that all removes happen from the top of the queue, 
> so move the first "k" events out to hashset and remove from there - we can 
> find k using the write id of the last checkpoint and the current one. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (FLUME-2155) Improve replay time

2013-12-06 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841465#comment-13841465
 ] 

Brock Noland commented on FLUME-2155:
-

Hi Hari,

In my testing this patch does look to improve the performance of copy! Nice 
work!

However, I was unable to find a scenerio where copy took a significant amount 
of time during replay.
I think we should find a case where copy takes a large amount of time during a 
file channel replay
before considering the change to the copy code.

However, in testing this patch, I do believe I have found the scenerio I have 
seen most often seen
cause long replays. In addition this is the scenerio I believe played out in 
FLUME-2118. In the
thread dump attached to the ticket you see the code in FEQ.remove():

{noformat}
"lifecycleSupervisor-1-0" prio=10 tid=0x7fea505f7000 nid=0x279e runnable 
[0x7fe84240d000]
   java.lang.Thread.State: RUNNABLE
  at 
org.apache.flume.channel.file.FlumeEventQueue.remove(FlumeEventQueue.java:195)
  - locked <0x7fe84d0007b8> (a 
org.apache.flume.channel.file.FlumeEventQueue)
  at 
org.apache.flume.channel.file.ReplayHandler.processCommit(ReplayHandler.java:404)
  at 
org.apache.flume.channel.file.ReplayHandler.replayLog(ReplayHandler.java:327)
  at org.apache.flume.channel.file.Log.doReplay(Log.java:503)
  at org.apache.flume.channel.file.Log.replay(Log.java:430)
  at org.apache.flume.channel.file.FileChannel.start(FileChannel.java:301)
{noformat}

In the attached patch, FLUME-FC-SLOW-REPLAY-1.patch which is for demonstration 
purposes, there is a 
new file TestFileChannelReplayPerformance.java. The test in that file 
demonstrates the issue.

The issue is that is when there are many takes in files that don't have 
associated put's we search
the entire queue which is O(N) for *each* take in the file.

As you noted, using an fast lookup data structure would solve this. However, if 
it were a memory
based data structure it would also consume large amounts of memory where it did 
not previously.
Specifically your example of 100 million capacity would result in a 1.4GB data 
structure 
(100 million * 16 bytes - I use 16 bytes because we have to store Long objects 
and assuming 64bit JVM).

I think we need an off-heap fast data structure to perform these lookups. There 
is a project
called MapDB (former jdbm) I have used in the past which provides such a data 
structure.

In the attached patch, FLUME-FC-SLOW-REPLAY-FIX-1.patch, I have used it to 
provide an off-heap Set 
which mirrors the FEQ. Without FLUME-FC-SLOW-REPLAY-FIX-1.patch, 
TestFileChannelReplayPerformance
takes 50 minutes to replay while it takes only 6.5 minutes with the fix.

Without fix:
{noformat}
FlumeEventQueue.logTimings Search Count = 669000.0, Search Time = 3044957.0, 
Copy Count = 321014.0, Copy Time = 1103.0
TestFileChannelReplayPerformance.testReplayPerformanc Total Replay Time = 
3500624
{noformat}

With fix:
{noformat}
 FlumeEventQueue.logTimings Search Count = 274449.0, Search Time = 1080.0, Copy 
Count = 274449.0, Copy Time = 1012.0
 TestFileChannelReplayPerformance.testReplayPerformance Total Replay Time = 
396338
{noformat}


NOTE: FLUME-FC-SLOW-REPLAY-1.patch and FLUME-FC-SLOW-REPLAY-FIX-1.patch are not 
for commit.



> Improve replay time
> ---
>
> Key: FLUME-2155
> URL: https://issues.apache.org/jira/browse/FLUME-2155
> Project: Flume
>  Issue Type: Bug
>Reporter: Hari Shreedharan
>Assignee: Hari Shreedharan
> Attachments: 1-2, 10-11, 30-31, 
> 70-71, FLUME-2155-initial.patch, FLUME-2155.patch, 
> FLUME-FC-SLOW-REPLAY-1.patch, FLUME-FC-SLOW-REPLAY-FIX-1.patch, 
> SmartReplay.pdf, SmartReplay1.1.pdf, fc-test.patch
>
>
> File Channel has scaled so well that people now run channels with sizes in 
> 100's of millions of events. Turns out, replay can be crazy slow even between 
> checkpoints at this scale - because of the remove() method in FlumeEventQueue 
> moving every pointer that follows the one being removed (1 remove causes 99 
> million+ moves for a channel of 100 million!). There are several ways of 
> improving - one being move at the end of replay - sort of like a compaction. 
> Another is to use the fact that all removes happen from the top of the queue, 
> so move the first "k" events out to hashset and remove from there - we can 
> find k using the write id of the last checkpoint and the current one. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (FLUME-2260) Recommend Dual Checkpoints in file channel documentation

2013-12-06 Thread Brock Noland (JIRA)

Brock Noland created FLUME-2260:
---

 Summary: Recommend Dual Checkpoints in file channel documentation
 Key: FLUME-2260
 URL: https://issues.apache.org/jira/browse/FLUME-2260
 Project: Flume
  Issue Type: Task
Reporter: Brock Noland


The work done by Hari on dual checkpoints is extremely valuable in mitigating 
long file channel replays. I think we should add a strong note recommending it 
be configured.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (FLUME-2260) Recommend Dual Checkpoints in file channel documentation

2013-12-06 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841380#comment-13841380
 ] 

Brock Noland commented on FLUME-2260:
-

[~hshreedharan] thoughts on this?

> Recommend Dual Checkpoints in file channel documentation
> 
>
> Key: FLUME-2260
> URL: https://issues.apache.org/jira/browse/FLUME-2260
> Project: Flume
>  Issue Type: Task
>Reporter: Brock Noland
>
> The work done by Hari on dual checkpoints is extremely valuable in mitigating 
> long file channel replays. I think we should add a strong note recommending 
> it be configured.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (FLUME-2118) Occasional multi-hour pauses in file channel replay

2013-12-06 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841375#comment-13841375
 ] 

Brock Noland commented on FLUME-2118:
-

Nevermind my last comment.  I do think this scenario occurs most often when 
dual checkpoint is not enabled because the slow remove() code hits much more 
often during full replay.

We'll take this forward in FLUME-2155.

TL; DR: Enable dual checkpoint and you'll see this less

> Occasional multi-hour pauses in file channel replay
> ---
>
> Key: FLUME-2118
> URL: https://issues.apache.org/jira/browse/FLUME-2118
> Project: Flume
>  Issue Type: Bug
>  Components: File Channel
>Affects Versions: v1.5.0
>Reporter: Juhani Connolly
> Attachments: flume-log, flume-thread-dump, gc-flume.log.20130702
>
>
> Sometimes during replay, immediately after an EOF of one log, the replay will 
> pause for a long time.
> Here are two samples from this morning when we restarted our 3 aggregators 
> and 2 of them hit this issue.
> 02 7 2013 03:06:30,089 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 220 
> records
> 02 7 2013 03:06:30,179 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 221 
> records
> 02 7 2013 03:06:30,241 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.LogFile$SequentialReader.next:505)  - 
> Encountered EOF at 1623195625 in /data2/flume-data/log-1184
> 02 7 2013 06:23:27,629 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 222 
> records
> 02 7 2013 06:23:28,641 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 223 
> records
> 02 7 2013 06:23:29,162 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 224 
> records
> 02 7 2013 06:23:30,118 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 225 
> records
> 02 7 2013 06:23:30,750 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 226 
> records
> 02 7 2013 08:03:00,942 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 216 
> records
> 02 7 2013 08:03:01,055 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 217 
> records
> 02 7 2013 08:03:01,168 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 218 
> records
> 02 7 2013 08:03:01,181 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.LogFile$SequentialReader.next:505)  - 
> Encountered EOF at 1623195640 in /data2/flume-data/log-1182
> 02 7 2013 14:45:55,302 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 219 
> records
> 02 7 2013 14:45:56,282 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 220 
> records
> 02 7 2013 14:45:57,084 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 221 
> records
> 02 7 2013 14:45:59,043 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 222 
> records
> I've tried for an hour and some to track down the cause of this. There's 
> nothing suspicious turning up on ganglia, and a cursory review of the code 
> didn't turn up anything overly suspicious. Owing to time limitations I can't 
> dig further at this time.
> We run a version of flume from somewhat before the current 1.4 release 
> candidate(hash is eefefa941a60c0982f0957804be0cafb4d83e46e) there doesn't 
> appear to be any replay patches since then.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Comment Edited] (FLUME-2181) Optionally disable File Channel fsyncs

2013-12-05 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13840518#comment-13840518
 ] 

Brock Noland edited comment on FLUME-2181 at 12/5/13 8:16 PM:
--

0. read header - bad event (bad file when doing replay)


was (Author: brocknoland):
0. read header - bad event and bad file

> Optionally disable File Channel fsyncs 
> ---
>
> Key: FLUME-2181
> URL: https://issues.apache.org/jira/browse/FLUME-2181
> Project: Flume
>  Issue Type: Improvement
>Reporter: Hari Shreedharan
>Assignee: Hari Shreedharan
> Attachments: FLUME-2181.patch
>
>
> This will give File Channel performance a big boost, at the cost of possible 
> data loss if a crash happens between checkpoints. 
> Also we should make it configurable, with default to false. If the user does 
> not mind slight inconsistencies, this feature can be explicitly enabled 
> through configuration. So if it is not configured, then the behavior will be 
> exactly as it is now.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (FLUME-2181) Optionally disable File Channel fsyncs

2013-12-05 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13840518#comment-13840518
 ] 

Brock Noland commented on FLUME-2181:
-

0. read header - bad event and bad file

> Optionally disable File Channel fsyncs 
> ---
>
> Key: FLUME-2181
> URL: https://issues.apache.org/jira/browse/FLUME-2181
> Project: Flume
>  Issue Type: Improvement
>Reporter: Hari Shreedharan
>Assignee: Hari Shreedharan
> Attachments: FLUME-2181.patch
>
>
> This will give File Channel performance a big boost, at the cost of possible 
> data loss if a crash happens between checkpoints. 
> Also we should make it configurable, with default to false. If the user does 
> not mind slight inconsistencies, this feature can be explicitly enabled 
> through configuration. So if it is not configured, then the behavior will be 
> exactly as it is now.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (FLUME-2181) Optionally disable File Channel fsyncs

2013-12-05 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13840492#comment-13840492
 ] 

Brock Noland commented on FLUME-2181:
-

bq. We'd use the checksum for this right?

We'll probably have to create a new exception InvalidEventException or 
something and throw that whenever an event is bad in re-player handler.

> Optionally disable File Channel fsyncs 
> ---
>
> Key: FLUME-2181
> URL: https://issues.apache.org/jira/browse/FLUME-2181
> Project: Flume
>  Issue Type: Improvement
>Reporter: Hari Shreedharan
>Assignee: Hari Shreedharan
> Attachments: FLUME-2181.patch
>
>
> This will give File Channel performance a big boost, at the cost of possible 
> data loss if a crash happens between checkpoints. 
> Also we should make it configurable, with default to false. If the user does 
> not mind slight inconsistencies, this feature can be explicitly enabled 
> through configuration. So if it is not configured, then the behavior will be 
> exactly as it is now.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (FLUME-2181) Optionally disable File Channel fsyncs

2013-12-05 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13840458#comment-13840458
 ] 

Brock Noland commented on FLUME-2181:
-

bq. We actually have exactly one sequential writer to each file. So all writes 
before a sync call get fsynced to disk (we can't make the first half of a file 
dirty after we fsync the 2nd half - since all writes are sequential). Yes, it 
is possible that the OS flushes the pages corresponding to the 2nd half before 
flushing the ones corresponding to the first half.

Right, I was referring to the OS which we have no control over.

bq. So we will actually need to seek to each offset, read the buffer - try to 
parse it 

We'd use the checksum for this right?

> Optionally disable File Channel fsyncs 
> ---
>
> Key: FLUME-2181
> URL: https://issues.apache.org/jira/browse/FLUME-2181
> Project: Flume
>  Issue Type: Improvement
>Reporter: Hari Shreedharan
>Assignee: Hari Shreedharan
> Attachments: FLUME-2181.patch
>
>
> This will give File Channel performance a big boost, at the cost of possible 
> data loss if a crash happens between checkpoints. 
> Also we should make it configurable, with default to false. If the user does 
> not mind slight inconsistencies, this feature can be explicitly enabled 
> through configuration. So if it is not configured, then the behavior will be 
> exactly as it is now.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Comment Edited] (FLUME-2181) Optionally disable File Channel fsyncs

2013-12-05 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13840421#comment-13840421
 ] 

Brock Noland edited comment on FLUME-2181 at 12/5/13 7:04 PM:
--

1) the close won't occur if the process is killed with SIGKILL so we need to 
flush explicitly.

2) 

bq. In reality, all takes past that event is really not valid 

Are we assuming that the disk will be written to disk sequentially?  I don't 
believe that will always be the case. For example without any fsyncs the last 
half of a file could be written to disk before the first half. Thus it's 
possible the first half of the file is corrupt but no the second half. How do 
we handle this case?

Additionally it's possible that the corruption occurs inside an event. That is 
the event header is correct and the next event header is correct, but the 
inside of the event is all null's. In this case we will be sending invalid data 
downstream.

bq. we can check for this by simply checking the length of the file before 
doing a take

The file is pre-allocated so I don't follow how this will work?


was (Author: brocknoland):
1) the close won't occur if the process is killed with SIGKILL so we need to 
flush explicitly.

2) 

bq. In reality, all takes past that event is really not valid 

Are we assuming that the disk will be written to disk sequentially?  I don't 
believe that will always be the case. For example without any fsyncs the last 
half of a file could be written to disk before the first half. Thus it's 
possible the first half of the file is corrupt but no the second half. How do 
we handle this case?

Additionally it's possible that the corruption occurs inside an event. That is 
the event header is correct and the next event header is correct, but the 
inside of the event is all null's. In this case we will be sending invalid data 
downstream.

bq we can check for this by simply checking the length of the file before doing 
a take

The file is pre-allocated so I don't follow how this will work?

> Optionally disable File Channel fsyncs 
> ---
>
> Key: FLUME-2181
> URL: https://issues.apache.org/jira/browse/FLUME-2181
> Project: Flume
>  Issue Type: Improvement
>Reporter: Hari Shreedharan
>Assignee: Hari Shreedharan
> Attachments: FLUME-2181.patch
>
>
> This will give File Channel performance a big boost, at the cost of possible 
> data loss if a crash happens between checkpoints. 
> Also we should make it configurable, with default to false. If the user does 
> not mind slight inconsistencies, this feature can be explicitly enabled 
> through configuration. So if it is not configured, then the behavior will be 
> exactly as it is now.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (FLUME-2181) Optionally disable File Channel fsyncs

2013-12-05 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13840421#comment-13840421
 ] 

Brock Noland commented on FLUME-2181:
-

1) the close won't occur if the process is killed with SIGKILL so we need to 
flush explicitly.

2) 

bq. In reality, all takes past that event is really not valid 

Are we assuming that the disk will be written to disk sequentially?  I don't 
believe that will always be the case. For example without any fsyncs the last 
half of a file could be written to disk before the first half. Thus it's 
possible the first half of the file is corrupt but no the second half. How do 
we handle this case?

Additionally it's possible that the corruption occurs inside an event. That is 
the event header is correct and the next event header is correct, but the 
inside of the event is all null's. In this case we will be sending invalid data 
downstream.

bq we can check for this by simply checking the length of the file before doing 
a take

The file is pre-allocated so I don't follow how this will work?

> Optionally disable File Channel fsyncs 
> ---
>
> Key: FLUME-2181
> URL: https://issues.apache.org/jira/browse/FLUME-2181
> Project: Flume
>  Issue Type: Improvement
>Reporter: Hari Shreedharan
>Assignee: Hari Shreedharan
> Attachments: FLUME-2181.patch
>
>
> This will give File Channel performance a big boost, at the cost of possible 
> data loss if a crash happens between checkpoints. 
> Also we should make it configurable, with default to false. If the user does 
> not mind slight inconsistencies, this feature can be explicitly enabled 
> through configuration. So if it is not configured, then the behavior will be 
> exactly as it is now.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (FLUME-2181) Optionally disable File Channel fsyncs

2013-12-05 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13840246#comment-13840246
 ] 

Brock Noland commented on FLUME-2181:
-

Hari,

Two review items: syncExecutor can be null during close and even if we don't 
sync, I do think we should flush, no? Otherwise a kill of the process can lose 
data without a kill of the machine.

Can you speak to the action the user will take when the channel is corrupt? 
From what I can tell, it's possible holes develop in the file during a crash, 
possibly contained in an event or across event boundaries. Should this feature 
coincide with the ability to skip bad events in the logs and skip the end of 
file during replay?

> Optionally disable File Channel fsyncs 
> ---
>
> Key: FLUME-2181
> URL: https://issues.apache.org/jira/browse/FLUME-2181
> Project: Flume
>  Issue Type: Improvement
>Reporter: Hari Shreedharan
>Assignee: Hari Shreedharan
> Attachments: FLUME-2181.patch
>
>
> This will give File Channel performance a big boost, at the cost of possible 
> data loss if a crash happens between checkpoints. 
> Also we should make it configurable, with default to false. If the user does 
> not mind slight inconsistencies, this feature can be explicitly enabled 
> through configuration. So if it is not configured, then the behavior will be 
> exactly as it is now.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (FLUME-2118) Occasional multi-hour pauses in file channel replay

2013-12-05 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13840041#comment-13840041
 ] 

Brock Noland commented on FLUME-2118:
-

I wonder if this issue can be considered resolved by dual checkpoints?

For anyone following along at home, I'd recommend using dual checkpoints when 
configuring file channel to avoid long replays.

> Occasional multi-hour pauses in file channel replay
> ---
>
> Key: FLUME-2118
> URL: https://issues.apache.org/jira/browse/FLUME-2118
> Project: Flume
>  Issue Type: Bug
>  Components: File Channel
>Affects Versions: v1.5.0
>Reporter: Juhani Connolly
> Attachments: flume-log, flume-thread-dump, gc-flume.log.20130702
>
>
> Sometimes during replay, immediately after an EOF of one log, the replay will 
> pause for a long time.
> Here are two samples from this morning when we restarted our 3 aggregators 
> and 2 of them hit this issue.
> 02 7 2013 03:06:30,089 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 220 
> records
> 02 7 2013 03:06:30,179 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 221 
> records
> 02 7 2013 03:06:30,241 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.LogFile$SequentialReader.next:505)  - 
> Encountered EOF at 1623195625 in /data2/flume-data/log-1184
> 02 7 2013 06:23:27,629 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 222 
> records
> 02 7 2013 06:23:28,641 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 223 
> records
> 02 7 2013 06:23:29,162 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 224 
> records
> 02 7 2013 06:23:30,118 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 225 
> records
> 02 7 2013 06:23:30,750 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 226 
> records
> 02 7 2013 08:03:00,942 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 216 
> records
> 02 7 2013 08:03:01,055 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 217 
> records
> 02 7 2013 08:03:01,168 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 218 
> records
> 02 7 2013 08:03:01,181 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.LogFile$SequentialReader.next:505)  - 
> Encountered EOF at 1623195640 in /data2/flume-data/log-1182
> 02 7 2013 14:45:55,302 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 219 
> records
> 02 7 2013 14:45:56,282 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 220 
> records
> 02 7 2013 14:45:57,084 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 221 
> records
> 02 7 2013 14:45:59,043 INFO  [lifecycleSupervisor-1-0] 
> (org.apache.flume.channel.file.ReplayHandler.replayLog:292)  - Read 222 
> records
> I've tried for an hour and some to track down the cause of this. There's 
> nothing suspicious turning up on ganglia, and a cursory review of the code 
> didn't turn up anything overly suspicious. Owing to time limitations I can't 
> dig further at this time.
> We run a version of flume from somewhat before the current 1.4 release 
> candidate(hash is eefefa941a60c0982f0957804be0cafb4d83e46e) there doesn't 
> appear to be any replay patches since then.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (FLUME-2199) Flume builds with new version require mvn install before site can be generated

2013-10-16 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13797519#comment-13797519
 ] 

Brock Noland commented on FLUME-2199:
-

[~abayer] could you post a full patch?  I am working on HIVE-5107 and I'd like 
to follow this best practice.

> Flume builds with new version require mvn install before site can be generated
> --
>
> Key: FLUME-2199
> URL: https://issues.apache.org/jira/browse/FLUME-2199
> Project: Flume
>  Issue Type: Bug
>  Components: Build
>Affects Versions: v1.4.0
>Reporter: Andrew Bayer
>Assignee: Andrew Bayer
> Fix For: v1.5.0
>
> Attachments: FLUME-2199.patch
>
>
> At this point, if you change the version for Flume, you need to run a mvn 
> install before you can run with -Psite (or, for that matter, javadoc:javadoc) 
> enabled. This is because the top-level POM in flume.git/pom.xml is both the 
> parent POM and the root of the reactor - since it's the parent, it's got to 
> run before any of the children that inherit from it, but site generation 
> should be running *after* all the children, so that it probably pulls in the 
> reactor's build of each child module, rather than having to pull in one 
> already installed/deployed before the build starts.
> There are a bunch of other reasons to split parent POM and top-level POM, but 
> that's the biggest one right there. 
> Also, the javadoc jar generation is a bit messed up - every module's javadoc 
> jar contains not only its own javadocs but the javadocs for every Flume 
> module it depends on. That, again, may make sense in a site context for the 
> top-level, but not for the individual modules. This results in unnecessary 
> bloat in the javadoc jars, and unnecessary time spent downloading the 
> "*-javadoc-resources.jar" for every dependency each module has, due to how 
> the javadoc plugin works. Also the whole site generation per-module thing, 
> which I am not a fan of in most cases. I don't think it's needed here. 
> Tweaking the site plugin not to run anywhere but the top-level and the 
> javadoc plugin to not do the dependency aggregation anywhere but the 
> top-level should make a big difference on build speed.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (FLUME-2155) Improve replay time

2013-08-13 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738660#comment-13738660
 ] 

Brock Noland commented on FLUME-2155:
-

Hari,

Thanks for the numbers! I do agree that the new algorithm could significantly 
reduce copy time. I am very sorry, I am afraid I was not clear enough earlier. 
The document makes the assumption the assumption that by improving the copy 
cost we'll significantly improve replay times. What I'd like to see is 
empirical evidence of that. I think this could be achieved by placing timers 
around the remove method and separating out search time from copy time.

Does this make sense?

> Improve replay time
> ---
>
> Key: FLUME-2155
> URL: https://issues.apache.org/jira/browse/FLUME-2155
> Project: Flume
>  Issue Type: Bug
>Reporter: Hari Shreedharan
>Assignee: Hari Shreedharan
> Attachments: SmartReplay.pdf
>
>
> File Channel has scaled so well that people now run channels with sizes in 
> 100's of millions of events. Turns out, replay can be crazy slow even between 
> checkpoints at this scale - because of the remove() method in FlumeEventQueue 
> moving every pointer that follows the one being removed (1 remove causes 99 
> million+ moves for a channel of 100 million!). There are several ways of 
> improving - one being move at the end of replay - sort of like a compaction. 
> Another is to use the fact that all removes happen from the top of the queue, 
> so move the first "k" events out to hashset and remove from there - we can 
> find k using the write id of the last checkpoint and the current one. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (FLUME-2155) Improve replay time

2013-08-12 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13737360#comment-13737360
 ] 

Brock Noland commented on FLUME-2155:
-

Hari,

Thank you for taking this on! Before we hack on this can we put a few counters 
in to show the cost of the linear search versus the move?

"it is likely to be non-trivial"

I assume you mean trivial?


> Improve replay time
> ---
>
> Key: FLUME-2155
> URL: https://issues.apache.org/jira/browse/FLUME-2155
> Project: Flume
>  Issue Type: Bug
>Reporter: Hari Shreedharan
>Assignee: Hari Shreedharan
> Attachments: SmartReplay.pdf
>
>
> File Channel has scaled so well that people now run channels with sizes in 
> 100's of millions of events. Turns out, replay can be crazy slow even between 
> checkpoints at this scale - because of the remove() method in FlumeEventQueue 
> moving every pointer that follows the one being removed (1 remove causes 99 
> million+ moves for a channel of 100 million!). There are several ways of 
> improving - one being move at the end of replay - sort of like a compaction. 
> Another is to use the fact that all removes happen from the top of the queue, 
> so move the first "k" events out to hashset and remove from there - we can 
> find k using the write id of the last checkpoint and the current one. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (FLUME-2115) FileChannel tool to truncate files at first sign of corruption

2013-06-27 Thread Brock Noland (JIRA)

Brock Noland created FLUME-2115:
---

 Summary: FileChannel tool to truncate files at first sign of 
corruption
 Key: FLUME-2115
 URL: https://issues.apache.org/jira/browse/FLUME-2115
 Project: Flume
  Issue Type: Bug
Reporter: Brock Noland


In FLUME-1586 we added a tool to handle individual event corruption. Using this 
same method we could add tool which reads a file when it files the file is 
corrupt fills the remaining portion of the file with truncation records.

This would allow users which end up with corrupt logs for whatever reason to 
salvage the remaining portion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (FLUME-2114) flume-ng script should handle common options such as -verbose:gc

2013-06-26 Thread Brock Noland (JIRA)

Brock Noland created FLUME-2114:
---

 Summary: flume-ng script should handle common options such as 
-verbose:gc
 Key: FLUME-2114
 URL: https://issues.apache.org/jira/browse/FLUME-2114
 Project: Flume
  Issue Type: Improvement
Reporter: Brock Noland


It'd be nice if we handled super-common options like -verbose:* similar to how 
we handle -X and -D options:

https://github.com/apache/flume/blob/trunk/bin/flume-ng#L302

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (FLUME-1285) FileChannel has a dependency on Hadoop IO classes

2013-06-23 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13691574#comment-13691574
 ] 

Brock Noland commented on FLUME-1285:
-

Running tests.

> FileChannel has a dependency on Hadoop IO classes
> -
>
> Key: FLUME-1285
> URL: https://issues.apache.org/jira/browse/FLUME-1285
> Project: Flume
>  Issue Type: Bug
>Affects Versions: v1.2.0
>Reporter: Mike Percy
>Assignee: Israel Ekpo
>Priority: Critical
> Fix For: v1.4.0
>
> Attachments: FLUME-1285-1.patch, FLUME-1285.20130429.patch, 
> FLUME-1285-3.patch, FLUME-1285-4.patch
>
>
> The FileChannel has a dependency on Hadoop IO classes. This may not be 
> necessary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (FLUME-1917) FileChannel group commit (coalesce fsync)

2013-06-23 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13691572#comment-13691572
 ] 

Brock Noland commented on FLUME-1917:
-

Thanks Hari! I committed this to trunk and 1.4.

> FileChannel group commit (coalesce fsync)
> -
>
> Key: FLUME-1917
> URL: https://issues.apache.org/jira/browse/FLUME-1917
> Project: Flume
>  Issue Type: Bug
>  Components: File Channel
>Reporter: Hari Shreedharan
>Assignee: Hari Shreedharan
> Fix For: v1.4.0
>
> Attachments: FLUME-1917.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (FLUME-1917) FileChannel group commit (coalesce fsync)

2013-06-23 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13691561#comment-13691561
 ] 

Brock Noland commented on FLUME-1917:
-

Hey Hari,

Yeah, I should have noted that thread-3 sync is a noop. The logic is obvious 
when we pass in the offset into the sync method but it looks like the current 
patch will result in the same number of syncs.

> FileChannel group commit (coalesce fsync)
> -
>
> Key: FLUME-1917
> URL: https://issues.apache.org/jira/browse/FLUME-1917
> Project: Flume
>  Issue Type: Bug
>  Components: File Channel
>Reporter: Hari Shreedharan
>Assignee: Hari Shreedharan
> Attachments: FLUME-1917.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (FLUME-1285) FileChannel has a dependency on Hadoop IO classes

2013-06-23 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13691559#comment-13691559
 ] 

Brock Noland commented on FLUME-1285:
-

This looks pretty good. Anyone think we shouldn't put it in 1.4?

> FileChannel has a dependency on Hadoop IO classes
> -
>
> Key: FLUME-1285
> URL: https://issues.apache.org/jira/browse/FLUME-1285
> Project: Flume
>  Issue Type: Bug
>Affects Versions: v1.2.0
>Reporter: Mike Percy
>Assignee: Israel Ekpo
>Priority: Critical
> Fix For: v1.4.0
>
> Attachments: FLUME-1285-1.patch, FLUME-1285.20130429.patch, 
> FLUME-1285-3.patch
>
>
> The FileChannel has a dependency on Hadoop IO classes. This may not be 
> necessary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (FLUME-1917) FileChannel group commit (coalesce fsync)

2013-06-22 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13691161#comment-13691161
 ] 

Brock Noland commented on FLUME-1917:
-

Hey Hari,

Thanks for the patch!  I think the patch will work, but I think there is a race 
that could cause "extra" syncs? Let me know if I am off base:

{noformat}
thread-1 commit
thread-2 commit
thread-2 sync
thread-3 commit
thread-1 sync <- here thread-1 will do a sync if though it's required?
thread-3 sync
{noformat}

I think we could eliminate this by returning the offset from put/rollback and 
then passing into sync?



> FileChannel group commit (coalesce fsync)
> -
>
> Key: FLUME-1917
> URL: https://issues.apache.org/jira/browse/FLUME-1917
> Project: Flume
>  Issue Type: Bug
>  Components: File Channel
>Reporter: Hari Shreedharan
>Assignee: Hari Shreedharan
> Attachments: FLUME-1917.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (FLUME-1586) File Channel should support verifying integrity of individual events.

2013-06-12 Thread Brock Noland (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLUME-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland resolved FLUME-1586.
-

   Resolution: Fixed
Fix Version/s: v1.4.0

Committed! Thanks Hari!

> File Channel should support verifying integrity of individual events.
> -
>
> Key: FLUME-1586
> URL: https://issues.apache.org/jira/browse/FLUME-1586
> Project: Flume
>  Issue Type: Improvement
>  Components: Channel
>Affects Versions: v1.2.0
>Reporter: Hari Shreedharan
>Assignee: Hari Shreedharan
> Fix For: v1.4.0
>
> Attachments: FLUME-1586-2.patch, FLUME-1586-2-rebased.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (FLUME-2079) Create filechannel v2

2013-06-11 Thread Brock Noland (JIRA)

Brock Noland created FLUME-2079:
---

 Summary: Create filechannel v2
 Key: FLUME-2079
 URL: https://issues.apache.org/jira/browse/FLUME-2079
 Project: Flume
  Issue Type: Sub-task
Reporter: Brock Noland


The file channel is stable and since we are going to be making changes to it 
would should fork the code as File Channel v2.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (FLUME-2080) Create file channel file system abstraction

2013-06-11 Thread Brock Noland (JIRA)

Brock Noland created FLUME-2080:
---

 Summary: Create file channel file system abstraction
 Key: FLUME-2080
 URL: https://issues.apache.org/jira/browse/FLUME-2080
 Project: Flume
  Issue Type: Sub-task
Reporter: Brock Noland


Testing IO errors is not possible with the current file channel. After forking 
we should refactor the channel to ensure it goes through an abstraction layer 
so we can inject errors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (FLUME-2077) Umbrella JIRA to track file channel improvments

2013-06-11 Thread Brock Noland (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLUME-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated FLUME-2077:


Description: 
The scope of this JIRA is:

1) FLUME-1968 - New format in support of
2) FLUME-2078 - Periodic fsync
3) FLUME-1946 - Tolerate disk failure

> Umbrella JIRA to track file channel improvments
> ---
>
> Key: FLUME-2077
> URL: https://issues.apache.org/jira/browse/FLUME-2077
> Project: Flume
>  Issue Type: Umbrella
>  Components: File Channel
>Reporter: Brock Noland
>
> The scope of this JIRA is:
> 1) FLUME-1968 - New format in support of
> 2) FLUME-2078 - Periodic fsync
> 3) FLUME-1946 - Tolerate disk failure

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (FLUME-2078) FileChannel periodic fsync should be supported

2013-06-11 Thread Brock Noland (JIRA)

Brock Noland created FLUME-2078:
---

 Summary: FileChannel periodic fsync should be supported
 Key: FLUME-2078
 URL: https://issues.apache.org/jira/browse/FLUME-2078
 Project: Flume
  Issue Type: New Feature
  Components: File Channel
Reporter: Brock Noland


It would be nice to have the option to do a periodic fsync as opposed to fsync 
on every commit. This would be an option and would be disabled by default.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (FLUME-2077) Umbrella JIRA to track file channel improvments

2013-06-11 Thread Brock Noland (JIRA)

Brock Noland created FLUME-2077:
---

 Summary: Umbrella JIRA to track file channel improvments
 Key: FLUME-2077
 URL: https://issues.apache.org/jira/browse/FLUME-2077
 Project: Flume
  Issue Type: Umbrella
  Components: File Channel
Reporter: Brock Noland




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (FLUME-1586) File Channel should support verifying integrity of individual events.

2013-06-11 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13680486#comment-13680486
 ] 

Brock Noland commented on FLUME-1586:
-

[~hshreedharan] can you post the patch on this JIRA?

> File Channel should support verifying integrity of individual events.
> -
>
> Key: FLUME-1586
> URL: https://issues.apache.org/jira/browse/FLUME-1586
> Project: Flume
>  Issue Type: Improvement
>  Components: Channel
>Affects Versions: v1.2.0
>Reporter: Hari Shreedharan
>Assignee: Hari Shreedharan
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (FLUME-2063) Add Configurable charset to RegexHbaseEventSerializer

2013-06-02 Thread Brock Noland (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLUME-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated FLUME-2063:


Summary: Add Configurable charset to RegexHbaseEventSerializer  (was: add a 
charset setting to RegexHbaseEventSerializer)

> Add Configurable charset to RegexHbaseEventSerializer
> -
>
> Key: FLUME-2063
> URL: https://issues.apache.org/jira/browse/FLUME-2063
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Affects Versions: v1.3.1
>Reporter: Roman Shaposhnik
>Assignee: Roman Shaposhnik
>Priority: Minor
> Fix For: v1.4.0
>
> Attachments: 
> 0001-FLUME-2063.-add-a-charset-setting-to-RegexHbaseEvent.patch
>
>
> Since HBase operates at the level of byte arrays Flume's 
> RegexHbaseEventSerializer has to specify the encoding charset for strings. It 
> is currently hardcoded to UTF-8 but it would be nice if it was tweakable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (FLUME-2063) add a charset setting to RegexHbaseEventSerializer

2013-06-01 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/FLUME-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13672134#comment-13672134
 ] 

Brock Noland commented on FLUME-2063:
-

Looks good to me. I'd run tests and commits but I am on reserve battery.

> add a charset setting to RegexHbaseEventSerializer
> --
>
> Key: FLUME-2063
> URL: https://issues.apache.org/jira/browse/FLUME-2063
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Affects Versions: v1.3.1
>Reporter: Roman Shaposhnik
>Assignee: Roman Shaposhnik
>Priority: Minor
> Fix For: v1.4.0
>
> Attachments: 
> 0001-FLUME-2063.-add-a-charset-setting-to-RegexHbaseEvent.patch
>
>
> Since HBase operates at the level of byte arrays Flume's 
> RegexHbaseEventSerializer has to specify the encoding charset for strings. It 
> is currently hardcoded to UTF-8 but it would be nice if it was tweakable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1062 matches

Mail list logo