Fwd: Flume unable to write to HDFS

2014-10-14 Thread raghuveer sj
Hi,

I have been using hadoop 2.4.1 and flume 1.5.0.1 for experimenting and
pretty new to these. I have flume-conf.properties as below:

agentMe.channels = memory-channel
agentMe.sources = my-source
agentMe.sinks = log-sink hdfs-sink

agentMe.channels.memory-channel.type = memory
agentMe.channels.memory-channel.capacity = 1000
agentMe.channels.memory-channel.transactionCapacity = 100

agentMe.sources.my-source.type = syslogtcp
#agentMe.sources.my-source.bind = 192.168.7.129
agentMe.sources.my-source.port = 8100
agentMe.sources.my-source.channels = memory-channel

# Define a sink that outputs to logger.
agentMe.sinks.log-sink.channel = memory-channel
agentMe.sinks.log-sink.type = logger

# Define a sink that outputs to hdfs.
agentMe.sinks.hdfs-sink.channel = memory-channel
agentMe.sinks.hdfs-sink.type = hdfs
agentMe.sinks.hdfs-sink.hdfs.path =
hdfs://localhost:54310/user/raghuveer/science
agentMe.sinks.hdfs-sink.hdfs.fileType = DataStream
agentMe.sinks.hdfs-sink.hdfs.batchSize = 2
agentMe.sinks.hdfs-sink.hdfs.rollCount = 0
agentMe.sinks.hdfs-sink.hdfs.rollSize = 0
agentMe.sinks.hdfs-sink.hdfs.rollInterval = 3
agentMe.sinks.hdfs-sink.hdfs.writeFormat = Text
#agentMe.sinks.hdfs-sink.hdfs.path = /user/raghuveer/%y-%m-%d/%H%M/%S


and trying to make a simple call from


public class FlumeClient {

public static void main(String[] args) {

MyRpcClientFacade client = new MyRpcClientFacade();
// Initialize client with the remote Flume agent's host and port
client.init("192.X.X.54", 8100);

// Send 10 events to the remote Flume agent. That agent should be
// configured to listen with an AvroSource.
String sampleData = "Hello Flume from Raghuveer...!";
for (int i = 0; i < 10; i++) {
client.sendDataToFlume(sampleData);
}

client.cleanUp();
}
}



import java.nio.charset.Charset;

import org.apache.flume.Event;
import org.apache.flume.EventDeliveryException;
import org.apache.flume.api.RpcClient;
import org.apache.flume.api.RpcClientFactory;
import org.apache.flume.event.EventBuilder;

public class MyRpcClientFacade {

private RpcClient client;
private String hostname;
private int port;

public void init(String hostname, int port) {
// Setup the RPC connection
this.hostname = hostname;
this.port = port;
this.client = RpcClientFactory.getDefaultInstance(hostname, port);
// Use the following method to create a thrift client (instead of
the
// above line):
// this.client = RpcClientFactory.getThriftInstance(hostname, port);
}

public void sendDataToFlume(String data) {
// Create a Flume Event object that encapsulates the sample data
Event event = EventBuilder.withBody(data, Charset.forName("UTF-8"));

// Send the event
try {
client.append(event);
} catch (EventDeliveryException e) {
// clean up and recreate the client
client.close();
client = null;
client = RpcClientFactory.getDefaultInstance(hostname, port);


}
}

public void cleanUp() {
// Close the RPC connection
client.close();
}
}

when i make the call the call fails with the following exception:

ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor]
(org.apache.flume.sink.hdfs.HDFSEventSink.process:467)  - process failed
java.lang.UnsupportedOperationException: Not implemented by the
DistributedFileSystem FileSystem implementation
at org.apache.hadoop.fs.FileSystem.getScheme(FileSystem.java:214)
at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2365)
at
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2375)
at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2392)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
at
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:270)
at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:262)
at
org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:718)
at
org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:183)
at
org.apache.flume.sink.hdfs.BucketWriter.access$1700(BucketWriter.java:59)
at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:715)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
13 Oct 2014 12:53:38,685 ERROR
[SinkRunner-PollingR

[jira] [Commented] (FLUME-2497) TCP and UDP syslog sources parsing the timestamp incorrectly

2014-10-14 Thread Hari Shreedharan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171955#comment-14171955
 ] 

Hari Shreedharan commented on FLUME-2497:
-

[~mpercy] - Could you please take a look? I am not too familiar with this code.

> TCP and UDP syslog sources parsing the timestamp incorrectly
> 
>
> Key: FLUME-2497
> URL: https://issues.apache.org/jira/browse/FLUME-2497
> Project: Flume
>  Issue Type: Bug
>Affects Versions: v1.5.0.1
>Reporter: Johny Rufus
>Assignee: Johny Rufus
> Fix For: v1.6.0
>
> Attachments: FLUME-2497.patch
>
>
> TCP and UDP syslog sources parse the timestamp incorrectly while using 
> Syslogutils extractEvent and buildEvent  methods.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLUME-1334) Write an startscript for flume agents on Windows

2014-10-14 Thread Roshan Naik (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated FLUME-1334:
---
Affects Version/s: v1.5.0.1

> Write an startscript for flume agents on Windows
> 
>
> Key: FLUME-1334
> URL: https://issues.apache.org/jira/browse/FLUME-1334
> Project: Flume
>  Issue Type: Improvement
>  Components: Configuration, Windows
>Affects Versions: v1.2.0, v1.4.0, v1.5.0.1
>Reporter: Alexander Alten-Lorenz
>Assignee: Roshan Naik
>  Labels: features, windows
> Attachments: FLUME-1334.patch, FLUME-1334.v2.patch, 
> FLUME-1334.v3.patch, FLUME-1334.v5.patch, flume-ng.ps1.txt
>
>
> Write an startscript with service level integration to run on Windows. 
> Targeted supported versions: Windows server 2008 / Windows 7 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLUME-1334) Write an startscript for flume agents on Windows

2014-10-14 Thread Roshan Naik (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated FLUME-1334:
---
Attachment: FLUME-1334.v5.patch

> Write an startscript for flume agents on Windows
> 
>
> Key: FLUME-1334
> URL: https://issues.apache.org/jira/browse/FLUME-1334
> Project: Flume
>  Issue Type: Improvement
>  Components: Configuration, Windows
>Affects Versions: v1.2.0, v1.4.0, v1.5.0.1
>Reporter: Alexander Alten-Lorenz
>Assignee: Roshan Naik
>  Labels: features, windows
> Attachments: FLUME-1334.patch, FLUME-1334.v2.patch, 
> FLUME-1334.v3.patch, FLUME-1334.v5.patch, flume-ng.ps1.txt
>
>
> Write an startscript with service level integration to run on Windows. 
> Targeted supported versions: Windows server 2008 / Windows 7 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-1710) JSONEvent.getBody should not return null

2014-10-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171944#comment-14171944
 ] 

Hudson commented on FLUME-1710:
---

FAILURE: Integrated in flume-trunk #676 (See 
[https://builds.apache.org/job/flume-trunk/676/])
FLUME-1710. JSONEvent.getBody should not return null (hshreedharan: 
http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=aa6fb7fbd9273c905a242c045f99a5b114fb3dc0)
* flume-ng-sdk/src/test/java/org/apache/flume/event/TestEventBuilder.java
* flume-ng-sdk/src/main/java/org/apache/flume/event/JSONEvent.java


> JSONEvent.getBody should not return null
> 
>
> Key: FLUME-1710
> URL: https://issues.apache.org/jira/browse/FLUME-1710
> Project: Flume
>  Issue Type: Improvement
>Affects Versions: v1.4.0
>Reporter: Brock Noland
>Assignee: Ashish Paliwal
>Priority: Trivial
> Attachments: FLUME-1710-0.patch
>
>
> Currently if the charset is not supported, it returns null. We should 
> propagate that error instead of returning null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2482) Race condition in File Channels' Log.removeOldLogs

2014-10-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171942#comment-14171942
 ] 

Hudson commented on FLUME-2482:
---

FAILURE: Integrated in flume-trunk #676 (See 
[https://builds.apache.org/job/flume-trunk/676/])
FLUME-2482. File Channel tests must disable scheduled checkpoint to avoid a 
race condition with forced checkpoint. (hshreedharan: 
http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=a582c100f5f0b368a6dcc77c2b29138ef4b28840)
* 
flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFileChannelRestart.java


> Race condition in File Channels' Log.removeOldLogs
> --
>
> Key: FLUME-2482
> URL: https://issues.apache.org/jira/browse/FLUME-2482
> Project: Flume
>  Issue Type: Bug
>  Components: File Channel
>Affects Versions: v1.5.0.1
>Reporter: Santiago M. Mola
>  Labels: race-condition
> Fix For: v1.6.0
>
> Attachments: FLUME-2482.patch
>
>
> TestFileChannelRestart.testToggleCheckpointCompressionFromFalseToTrue 
> sometimes produces a race condition at Log.removeOldLogs.
> https://travis-ci.org/Stratio/flume/jobs/36782318#L6193
> testToggleCheckpointCompressionFromFalseToTrue(org.apache.flume.channel.file.TestFileChannelRestart)
>  Time elapsed: 144362 sec <<< ERROR!
> java.util.ConcurrentModificationException
> at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:859)
> at java.util.ArrayList$Itr.next(ArrayList.java:831)
> at org.apache.flume.channel.file.Log.removeOldLogs(Log.java:1070)
> at org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:1055)
> at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.fest.reflect.method.Invoker.invoke(Invoker.java:110)
> at org.apache.flume.channel.file.TestUtils.forceCheckpoint(TestUtils.java:134)
> at 
> org.apache.flume.channel.file.TestFileChannelRestart.restartToggleCompression(TestFileChannelRestart.java:930)
> at 
> org.apache.flume.channel.file.TestFileChannelRestart.testToggleCheckpointCompressionFromFalseToTrue(TestFileChannelRestart.java:896)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
> at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
> at 
> org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
> at 
> org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
> at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)



--
This messag

[jira] [Commented] (FLUME-2126) Problem in elasticsearch sink when the event body is a complex field

2014-10-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171943#comment-14171943
 ] 

Hudson commented on FLUME-2126:
---

FAILURE: Integrated in flume-trunk #676 (See 
[https://builds.apache.org/job/flume-trunk/676/])
FLUME-2126. Problem in elasticsearch sink when the event body is a complex 
field (hshreedharan: 
http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=8328bccd41077d457cab064541127fc993e97619)
* 
flume-ng-sinks/flume-ng-elasticsearch-sink/src/test/java/org/apache/flume/sink/elasticsearch/TestElasticSearchSink.java
* 
flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ContentBuilderUtil.java


> Problem in elasticsearch sink when the event body is a complex field
> 
>
> Key: FLUME-2126
> URL: https://issues.apache.org/jira/browse/FLUME-2126
> Project: Flume
>  Issue Type: Bug
>  Components: Sinks+Sources
> Environment: 1.3.1 and 1.4
>Reporter: Massimo Paladin
>Assignee: Ashish Paliwal
> Attachments: FLUME-2126-0.patch
>
>
> I have found a bug in the elasticsearch sink, the problem is in the 
> {{ContentBuilderUtil.addComplexField}} method, when it does 
> {{builder.field(fieldName, tmp);}} the {{tmp}} object is taken as {{Object}} 
> with the result of being serialized with the {{toString}} method in the 
> {{XContentBuilder}}. In the end you get the object reference as content.
> The following change workaround the problem for me, the bad point is that it 
> has to parse the content twice, I guess there is a better way to solve the 
> problem but I am not an elasticsearch api expert. 
> {code}
> --- 
> a/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ContentBuilderUtil.java
> +++ 
> b/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ContentBuilderUtil.java
> @@ -61,7 +61,12 @@ public class ContentBuilderUtil {
>parser = XContentFactory.xContent(contentType).createParser(data);
>parser.nextToken();
>tmp.copyCurrentStructure(parser);
> -  builder.field(fieldName, tmp);
> +
> +  // if it is a valid structure then we include it
> +  parser = XContentFactory.xContent(contentType).createParser(data);
> +  parser.nextToken();
> +  builder.field(fieldName);
> +  builder.copyCurrentStructure(parser);
>  } catch (JsonParseException ex) {
>// If we get an exception here the most likely cause is nested JSON 
> that
>// can't be figured out in the body. At this point just push it through
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: flume-trunk #676

2014-10-14 Thread Apache Jenkins Server
See 

Changes:

[hshreedharan] FLUME-2126. Problem in elasticsearch sink when the event body is 
a complex field

[hshreedharan] FLUME-1710. JSONEvent.getBody should not return null

[hshreedharan] FLUME-2482. File Channel tests must disable scheduled checkpoint 
to avoid a race condition with forced checkpoint.

--
[...truncated 4920 lines...]
[ERROR] 
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :flume-ng-tests
[JENKINS] Archiving 

 to 
org.apache.flume.flume-ng-sinks/flume-ng-kafka-sink/1.6.0-SNAPSHOT/flume-ng-kafka-sink-1.6.0-SNAPSHOT.pom
[JENKINS] Archiving 

 to 
org.apache.flume.flume-ng-sinks/flume-ng-kafka-sink/1.6.0-SNAPSHOT/flume-ng-kafka-sink-1.6.0-SNAPSHOT.jar
Sending artifact delta relative to flume-trunk » Flume Kafka Sink #673
Archived 2 artifacts
Archive block size is 32768
Received 0 blocks and 17269 bytes
Compression is 0.0%
Took 0.44 sec
[JENKINS] Archiving 

 to 
org.apache.flume.flume-ng-legacy-sources/flume-thrift-source/1.6.0-SNAPSHOT/flume-thrift-source-1.6.0-SNAPSHOT.pom
[JENKINS] Archiving 

 to 
org.apache.flume.flume-ng-legacy-sources/flume-thrift-source/1.6.0-SNAPSHOT/flume-thrift-source-1.6.0-SNAPSHOT.jar
Sending artifact delta relative to flume-trunk » Flume legacy Thrift Source #673
Archived 2 artifacts
Archive block size is 32768
Received 0 blocks and 62939 bytes
Compression is 0.0%
Took 0.2 sec
[JENKINS] Archiving 

 to 
org.apache.flume.flume-ng-channels/flume-spillable-memory-channel/1.6.0-SNAPSHOT/flume-spillable-memory-channel-1.6.0-SNAPSHOT.pom
[JENKINS] Archiving 

 to 
org.apache.flume.flume-ng-channels/flume-spillable-memory-channel/1.6.0-SNAPSHOT/flume-spillable-memory-channel-1.6.0-SNAPSHOT.jar
Sending artifact delta relative to flume-trunk » Flume NG Spillable Memory 
channel #673
Archived 2 artifacts
Archive block size is 32768
Received 0 blocks and 25762 bytes
Compression is 0.0%
Took 1.9 sec
[JENKINS] Archiving 
 to 
org.apache.flume/flume-ng-dist/1.6.0-SNAPSHOT/flume-ng-dist-1.6.0-SNAPSHOT.pom
[JENKINS] Archiving 

 to 
org.apache.flume/flume-ng-dist/1.6.0-SNAPSHOT/flume-ng-dist-1.6.0-SNAPSHOT-bin.tar.gz
[JENKINS] Archiving 

 to 
org.apache.flume/flume-ng-dist/1.6.0-SNAPSHOT/flume-ng-dist-1.6.0-SNAPSHOT-src.tar.gz
Sending artifact delta relative to flume-trunk » Flume NG distribution #673
Archived 3 artifacts
Archive block size is 32768
Received 0 blocks and 35461000 bytes
Compression is 0.0%
Took 21 sec
[JENKINS] Archiving 
 
to 
org.apache.flume/flume-ng-legacy-sources/1.6.0-SNAPSHOT/flume-ng-legacy-sources-1.6.0-SNAPSHOT.pom
Sending artifact delta relative to flume-trunk » Flume legacy Sources #673
Archived 1 artifacts
Archive block size is 32768
Received 0 blocks and 1752 bytes
Compression is 0.0%
Took 73 ms
[JENKINS] Archiving 

 to 
org.apache.flume.flume-ng-sources/flume-twitter-source/1.6.0-SNAPSHOT/flume-twitter-source-1.6.0-SNAPSHOT.pom
[JENKINS] Archiving 

 to 
org.apache.flume.flume-ng-sources/flume-twitter-source/1.6.0-SNAPSHOT/flume-twitter-source-1.6.0-SNAPSHOT.jar
Sending artifact delta relative to flume-trunk » Flume Twitter Source #673
Archived 2 artifacts
Archive block size is 32768
Received 0 blocks and 17029 bytes
Compression is 0.0%
Took 63 ms
[JENKINS] Archiving 
 to 
org.apa

[jira] [Commented] (FLUME-1710) JSONEvent.getBody should not return null

2014-10-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171910#comment-14171910
 ] 

Hudson commented on FLUME-1710:
---

UNSTABLE: Integrated in Flume-trunk-hbase-98 #35 (See 
[https://builds.apache.org/job/Flume-trunk-hbase-98/35/])
FLUME-1710. JSONEvent.getBody should not return null (hshreedharan: 
http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=aa6fb7fbd9273c905a242c045f99a5b114fb3dc0)
* flume-ng-sdk/src/test/java/org/apache/flume/event/TestEventBuilder.java
* flume-ng-sdk/src/main/java/org/apache/flume/event/JSONEvent.java


> JSONEvent.getBody should not return null
> 
>
> Key: FLUME-1710
> URL: https://issues.apache.org/jira/browse/FLUME-1710
> Project: Flume
>  Issue Type: Improvement
>Affects Versions: v1.4.0
>Reporter: Brock Noland
>Assignee: Ashish Paliwal
>Priority: Trivial
> Attachments: FLUME-1710-0.patch
>
>
> Currently if the charset is not supported, it returns null. We should 
> propagate that error instead of returning null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2482) Race condition in File Channels' Log.removeOldLogs

2014-10-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171909#comment-14171909
 ] 

Hudson commented on FLUME-2482:
---

UNSTABLE: Integrated in Flume-trunk-hbase-98 #35 (See 
[https://builds.apache.org/job/Flume-trunk-hbase-98/35/])
FLUME-2482. File Channel tests must disable scheduled checkpoint to avoid a 
race condition with forced checkpoint. (hshreedharan: 
http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=a582c100f5f0b368a6dcc77c2b29138ef4b28840)
* 
flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFileChannelRestart.java


> Race condition in File Channels' Log.removeOldLogs
> --
>
> Key: FLUME-2482
> URL: https://issues.apache.org/jira/browse/FLUME-2482
> Project: Flume
>  Issue Type: Bug
>  Components: File Channel
>Affects Versions: v1.5.0.1
>Reporter: Santiago M. Mola
>  Labels: race-condition
> Fix For: v1.6.0
>
> Attachments: FLUME-2482.patch
>
>
> TestFileChannelRestart.testToggleCheckpointCompressionFromFalseToTrue 
> sometimes produces a race condition at Log.removeOldLogs.
> https://travis-ci.org/Stratio/flume/jobs/36782318#L6193
> testToggleCheckpointCompressionFromFalseToTrue(org.apache.flume.channel.file.TestFileChannelRestart)
>  Time elapsed: 144362 sec <<< ERROR!
> java.util.ConcurrentModificationException
> at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:859)
> at java.util.ArrayList$Itr.next(ArrayList.java:831)
> at org.apache.flume.channel.file.Log.removeOldLogs(Log.java:1070)
> at org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:1055)
> at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.fest.reflect.method.Invoker.invoke(Invoker.java:110)
> at org.apache.flume.channel.file.TestUtils.forceCheckpoint(TestUtils.java:134)
> at 
> org.apache.flume.channel.file.TestFileChannelRestart.restartToggleCompression(TestFileChannelRestart.java:930)
> at 
> org.apache.flume.channel.file.TestFileChannelRestart.testToggleCheckpointCompressionFromFalseToTrue(TestFileChannelRestart.java:896)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
> at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
> at 
> org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
> at 
> org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
> at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)

Jenkins build is still unstable: Flume-trunk-hbase-98 #35

2014-10-14 Thread Apache Jenkins Server
See 



Build failed in Jenkins: flume-trunk #675

2014-10-14 Thread Apache Jenkins Server
See 

--
Failed to access build log

java.io.IOException: remote file operation failed: 
/home/jenkins/jenkins-slave/workspace/flume-trunk at 
hudson.remoting.Channel@2b0d30ba:ubuntu-2
at hudson.FilePath.act(FilePath.java:910)
at hudson.FilePath.act(FilePath.java:887)
at hudson.FilePath.toURI(FilePath.java:1036)
at hudson.tasks.MailSender.createFailureMail(MailSender.java:278)
at hudson.tasks.MailSender.getMail(MailSender.java:153)
at hudson.tasks.MailSender.execute(MailSender.java:101)
at 
hudson.maven.MavenModuleSetBuild$MavenModuleSetBuildExecution.cleanUp(MavenModuleSetBuild.java:1058)
at hudson.model.Run.execute(Run.java:1721)
at hudson.maven.MavenModuleSetBuild.run(MavenModuleSetBuild.java:529)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:231)
Caused by: hudson.remoting.ChannelClosedException: channel is already closed
at hudson.remoting.Channel.send(Channel.java:524)
at hudson.remoting.Request.call(Request.java:129)
at hudson.remoting.Channel.call(Channel.java:722)
at hudson.FilePath.act(FilePath.java:903)
... 10 more
Caused by: java.io.IOException
at hudson.remoting.Channel.close(Channel.java:1007)
at hudson.slaves.ChannelPinger$1.onDead(ChannelPinger.java:110)
at hudson.remoting.PingThread.ping(PingThread.java:120)
at hudson.remoting.PingThread.run(PingThread.java:81)
Caused by: java.util.concurrent.TimeoutException: Ping started on 1413323617282 
hasn't completed at 1413323857283
... 2 more


Jenkins build became unstable: Flume-trunk-hbase-98 #34

2014-10-14 Thread Apache Jenkins Server
See 



[jira] [Commented] (FLUME-2126) Problem in elasticsearch sink when the event body is a complex field

2014-10-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171862#comment-14171862
 ] 

Hudson commented on FLUME-2126:
---

UNSTABLE: Integrated in Flume-trunk-hbase-98 #34 (See 
[https://builds.apache.org/job/Flume-trunk-hbase-98/34/])
FLUME-2126. Problem in elasticsearch sink when the event body is a complex 
field (hshreedharan: 
http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=8328bccd41077d457cab064541127fc993e97619)
* 
flume-ng-sinks/flume-ng-elasticsearch-sink/src/test/java/org/apache/flume/sink/elasticsearch/TestElasticSearchSink.java
* 
flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ContentBuilderUtil.java


> Problem in elasticsearch sink when the event body is a complex field
> 
>
> Key: FLUME-2126
> URL: https://issues.apache.org/jira/browse/FLUME-2126
> Project: Flume
>  Issue Type: Bug
>  Components: Sinks+Sources
> Environment: 1.3.1 and 1.4
>Reporter: Massimo Paladin
>Assignee: Ashish Paliwal
> Attachments: FLUME-2126-0.patch
>
>
> I have found a bug in the elasticsearch sink, the problem is in the 
> {{ContentBuilderUtil.addComplexField}} method, when it does 
> {{builder.field(fieldName, tmp);}} the {{tmp}} object is taken as {{Object}} 
> with the result of being serialized with the {{toString}} method in the 
> {{XContentBuilder}}. In the end you get the object reference as content.
> The following change workaround the problem for me, the bad point is that it 
> has to parse the content twice, I guess there is a better way to solve the 
> problem but I am not an elasticsearch api expert. 
> {code}
> --- 
> a/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ContentBuilderUtil.java
> +++ 
> b/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ContentBuilderUtil.java
> @@ -61,7 +61,12 @@ public class ContentBuilderUtil {
>parser = XContentFactory.xContent(contentType).createParser(data);
>parser.nextToken();
>tmp.copyCurrentStructure(parser);
> -  builder.field(fieldName, tmp);
> +
> +  // if it is a valid structure then we include it
> +  parser = XContentFactory.xContent(contentType).createParser(data);
> +  parser.nextToken();
> +  builder.field(fieldName);
> +  builder.copyCurrentStructure(parser);
>  } catch (JsonParseException ex) {
>// If we get an exception here the most likely cause is nested JSON 
> that
>// can't be figured out in the body. At this point just push it through
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2482) Race condition in File Channels' Log.removeOldLogs

2014-10-14 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171839#comment-14171839
 ] 

ASF subversion and git services commented on FLUME-2482:


Commit 150dbc519d0552ea94e1d6de2d1b455eae022155 in flume's branch 
refs/heads/flume-1.6 from [~hshreedharan]
[ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=150dbc5 ]

FLUME-2482. File Channel tests must disable scheduled checkpoint to avoid a 
race condition with forced checkpoint.

(Johny Rufus via Hari)


> Race condition in File Channels' Log.removeOldLogs
> --
>
> Key: FLUME-2482
> URL: https://issues.apache.org/jira/browse/FLUME-2482
> Project: Flume
>  Issue Type: Bug
>  Components: File Channel
>Affects Versions: v1.5.0.1
>Reporter: Santiago M. Mola
>  Labels: race-condition
> Fix For: v1.6.0
>
> Attachments: FLUME-2482.patch
>
>
> TestFileChannelRestart.testToggleCheckpointCompressionFromFalseToTrue 
> sometimes produces a race condition at Log.removeOldLogs.
> https://travis-ci.org/Stratio/flume/jobs/36782318#L6193
> testToggleCheckpointCompressionFromFalseToTrue(org.apache.flume.channel.file.TestFileChannelRestart)
>  Time elapsed: 144362 sec <<< ERROR!
> java.util.ConcurrentModificationException
> at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:859)
> at java.util.ArrayList$Itr.next(ArrayList.java:831)
> at org.apache.flume.channel.file.Log.removeOldLogs(Log.java:1070)
> at org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:1055)
> at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.fest.reflect.method.Invoker.invoke(Invoker.java:110)
> at org.apache.flume.channel.file.TestUtils.forceCheckpoint(TestUtils.java:134)
> at 
> org.apache.flume.channel.file.TestFileChannelRestart.restartToggleCompression(TestFileChannelRestart.java:930)
> at 
> org.apache.flume.channel.file.TestFileChannelRestart.testToggleCheckpointCompressionFromFalseToTrue(TestFileChannelRestart.java:896)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
> at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
> at 
> org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
> at 
> org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
> at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2482) Race condition in File Channels' Log.removeOldLogs

2014-10-14 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171838#comment-14171838
 ] 

ASF subversion and git services commented on FLUME-2482:


Commit a582c100f5f0b368a6dcc77c2b29138ef4b28840 in flume's branch 
refs/heads/trunk from [~hshreedharan]
[ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=a582c10 ]

FLUME-2482. File Channel tests must disable scheduled checkpoint to avoid a 
race condition with forced checkpoint.

(Johny Rufus via Hari)


> Race condition in File Channels' Log.removeOldLogs
> --
>
> Key: FLUME-2482
> URL: https://issues.apache.org/jira/browse/FLUME-2482
> Project: Flume
>  Issue Type: Bug
>  Components: File Channel
>Affects Versions: v1.5.0.1
>Reporter: Santiago M. Mola
>  Labels: race-condition
> Fix For: v1.6.0
>
> Attachments: FLUME-2482.patch
>
>
> TestFileChannelRestart.testToggleCheckpointCompressionFromFalseToTrue 
> sometimes produces a race condition at Log.removeOldLogs.
> https://travis-ci.org/Stratio/flume/jobs/36782318#L6193
> testToggleCheckpointCompressionFromFalseToTrue(org.apache.flume.channel.file.TestFileChannelRestart)
>  Time elapsed: 144362 sec <<< ERROR!
> java.util.ConcurrentModificationException
> at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:859)
> at java.util.ArrayList$Itr.next(ArrayList.java:831)
> at org.apache.flume.channel.file.Log.removeOldLogs(Log.java:1070)
> at org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:1055)
> at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.fest.reflect.method.Invoker.invoke(Invoker.java:110)
> at org.apache.flume.channel.file.TestUtils.forceCheckpoint(TestUtils.java:134)
> at 
> org.apache.flume.channel.file.TestFileChannelRestart.restartToggleCompression(TestFileChannelRestart.java:930)
> at 
> org.apache.flume.channel.file.TestFileChannelRestart.testToggleCheckpointCompressionFromFalseToTrue(TestFileChannelRestart.java:896)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
> at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
> at 
> org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
> at 
> org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
> at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2482) Race condition in File Channels' Log.removeOldLogs

2014-10-14 Thread Hari Shreedharan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171837#comment-14171837
 ] 

Hari Shreedharan commented on FLUME-2482:
-

+1. I am committing this one.

> Race condition in File Channels' Log.removeOldLogs
> --
>
> Key: FLUME-2482
> URL: https://issues.apache.org/jira/browse/FLUME-2482
> Project: Flume
>  Issue Type: Bug
>  Components: File Channel
>Affects Versions: v1.5.0.1
>Reporter: Santiago M. Mola
>  Labels: race-condition
> Fix For: v1.6.0
>
> Attachments: FLUME-2482.patch
>
>
> TestFileChannelRestart.testToggleCheckpointCompressionFromFalseToTrue 
> sometimes produces a race condition at Log.removeOldLogs.
> https://travis-ci.org/Stratio/flume/jobs/36782318#L6193
> testToggleCheckpointCompressionFromFalseToTrue(org.apache.flume.channel.file.TestFileChannelRestart)
>  Time elapsed: 144362 sec <<< ERROR!
> java.util.ConcurrentModificationException
> at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:859)
> at java.util.ArrayList$Itr.next(ArrayList.java:831)
> at org.apache.flume.channel.file.Log.removeOldLogs(Log.java:1070)
> at org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:1055)
> at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.fest.reflect.method.Invoker.invoke(Invoker.java:110)
> at org.apache.flume.channel.file.TestUtils.forceCheckpoint(TestUtils.java:134)
> at 
> org.apache.flume.channel.file.TestFileChannelRestart.restartToggleCompression(TestFileChannelRestart.java:930)
> at 
> org.apache.flume.channel.file.TestFileChannelRestart.testToggleCheckpointCompressionFromFalseToTrue(TestFileChannelRestart.java:896)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
> at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
> at 
> org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
> at 
> org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
> at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-1710) JSONEvent.getBody should not return null

2014-10-14 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171830#comment-14171830
 ] 

ASF subversion and git services commented on FLUME-1710:


Commit 93ff446ff00f6044e4b58f4594a47c81caf16ddf in flume's branch 
refs/heads/flume-1.6 from [~hshreedharan]
[ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=93ff446 ]

FLUME-1710. JSONEvent.getBody should not return null

(Ashish Paliwal via Hari)


> JSONEvent.getBody should not return null
> 
>
> Key: FLUME-1710
> URL: https://issues.apache.org/jira/browse/FLUME-1710
> Project: Flume
>  Issue Type: Improvement
>Affects Versions: v1.4.0
>Reporter: Brock Noland
>Assignee: Ashish Paliwal
>Priority: Trivial
> Attachments: FLUME-1710-0.patch
>
>
> Currently if the charset is not supported, it returns null. We should 
> propagate that error instead of returning null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-1710) JSONEvent.getBody should not return null

2014-10-14 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171829#comment-14171829
 ] 

ASF subversion and git services commented on FLUME-1710:


Commit aa6fb7fbd9273c905a242c045f99a5b114fb3dc0 in flume's branch 
refs/heads/trunk from [~hshreedharan]
[ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=aa6fb7f ]

FLUME-1710. JSONEvent.getBody should not return null

(Ashish Paliwal via Hari)


> JSONEvent.getBody should not return null
> 
>
> Key: FLUME-1710
> URL: https://issues.apache.org/jira/browse/FLUME-1710
> Project: Flume
>  Issue Type: Improvement
>Affects Versions: v1.4.0
>Reporter: Brock Noland
>Assignee: Ashish Paliwal
>Priority: Trivial
> Attachments: FLUME-1710-0.patch
>
>
> Currently if the charset is not supported, it returns null. We should 
> propagate that error instead of returning null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: flume-trunk #674

2014-10-14 Thread Apache Jenkins Server
See 

--
Failed to access build log

java.io.IOException: remote file operation failed: 
/home/jenkins/jenkins-slave/workspace/flume-trunk at 
hudson.remoting.Channel@2b0d30ba:ubuntu-2
at hudson.FilePath.act(FilePath.java:910)
at hudson.FilePath.act(FilePath.java:887)
at hudson.FilePath.toURI(FilePath.java:1036)
at hudson.tasks.MailSender.createFailureMail(MailSender.java:278)
at hudson.tasks.MailSender.getMail(MailSender.java:153)
at hudson.tasks.MailSender.execute(MailSender.java:101)
at 
hudson.maven.MavenModuleSetBuild$MavenModuleSetBuildExecution.cleanUp(MavenModuleSetBuild.java:1058)
at hudson.model.Run.execute(Run.java:1721)
at hudson.maven.MavenModuleSetBuild.run(MavenModuleSetBuild.java:529)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:231)
Caused by: hudson.remoting.ChannelClosedException: channel is already closed
at hudson.remoting.Channel.send(Channel.java:524)
at hudson.remoting.Request.call(Request.java:129)
at hudson.remoting.Channel.call(Channel.java:722)
at hudson.FilePath.act(FilePath.java:903)
... 10 more
Caused by: java.io.IOException
at hudson.remoting.Channel.close(Channel.java:1007)
at hudson.slaves.ChannelPinger$1.onDead(ChannelPinger.java:110)
at hudson.remoting.PingThread.ping(PingThread.java:120)
at hudson.remoting.PingThread.run(PingThread.java:81)
Caused by: java.util.concurrent.TimeoutException: Ping started on 1413323617282 
hasn't completed at 1413323857283
... 2 more


[jira] [Commented] (FLUME-1710) JSONEvent.getBody should not return null

2014-10-14 Thread Hari Shreedharan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171793#comment-14171793
 ] 

Hari Shreedharan commented on FLUME-1710:
-

+1. Running tests and committing.

> JSONEvent.getBody should not return null
> 
>
> Key: FLUME-1710
> URL: https://issues.apache.org/jira/browse/FLUME-1710
> Project: Flume
>  Issue Type: Improvement
>Affects Versions: v1.4.0
>Reporter: Brock Noland
>Assignee: Ashish Paliwal
>Priority: Trivial
> Attachments: FLUME-1710-0.patch
>
>
> Currently if the charset is not supported, it returns null. We should 
> propagate that error instead of returning null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2126) Problem in elasticsearch sink when the event body is a complex field

2014-10-14 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171772#comment-14171772
 ] 

ASF subversion and git services commented on FLUME-2126:


Commit 8328bccd41077d457cab064541127fc993e97619 in flume's branch 
refs/heads/trunk from [~hshreedharan]
[ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=8328bcc ]

FLUME-2126. Problem in elasticsearch sink when the event body is a complex field

(Ashish Paliwal via Hari)


> Problem in elasticsearch sink when the event body is a complex field
> 
>
> Key: FLUME-2126
> URL: https://issues.apache.org/jira/browse/FLUME-2126
> Project: Flume
>  Issue Type: Bug
>  Components: Sinks+Sources
> Environment: 1.3.1 and 1.4
>Reporter: Massimo Paladin
>Assignee: Ashish Paliwal
> Attachments: FLUME-2126-0.patch
>
>
> I have found a bug in the elasticsearch sink, the problem is in the 
> {{ContentBuilderUtil.addComplexField}} method, when it does 
> {{builder.field(fieldName, tmp);}} the {{tmp}} object is taken as {{Object}} 
> with the result of being serialized with the {{toString}} method in the 
> {{XContentBuilder}}. In the end you get the object reference as content.
> The following change workaround the problem for me, the bad point is that it 
> has to parse the content twice, I guess there is a better way to solve the 
> problem but I am not an elasticsearch api expert. 
> {code}
> --- 
> a/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ContentBuilderUtil.java
> +++ 
> b/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ContentBuilderUtil.java
> @@ -61,7 +61,12 @@ public class ContentBuilderUtil {
>parser = XContentFactory.xContent(contentType).createParser(data);
>parser.nextToken();
>tmp.copyCurrentStructure(parser);
> -  builder.field(fieldName, tmp);
> +
> +  // if it is a valid structure then we include it
> +  parser = XContentFactory.xContent(contentType).createParser(data);
> +  parser.nextToken();
> +  builder.field(fieldName);
> +  builder.copyCurrentStructure(parser);
>  } catch (JsonParseException ex) {
>// If we get an exception here the most likely cause is nested JSON 
> that
>// can't be figured out in the body. At this point just push it through
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2126) Problem in elasticsearch sink when the event body is a complex field

2014-10-14 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171773#comment-14171773
 ] 

ASF subversion and git services commented on FLUME-2126:


Commit 5093d98189a15e82b223597eb24491cbcb7340db in flume's branch 
refs/heads/flume-1.6 from [~hshreedharan]
[ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=5093d98 ]

FLUME-2126. Problem in elasticsearch sink when the event body is a complex field

(Ashish Paliwal via Hari)


> Problem in elasticsearch sink when the event body is a complex field
> 
>
> Key: FLUME-2126
> URL: https://issues.apache.org/jira/browse/FLUME-2126
> Project: Flume
>  Issue Type: Bug
>  Components: Sinks+Sources
> Environment: 1.3.1 and 1.4
>Reporter: Massimo Paladin
>Assignee: Ashish Paliwal
> Attachments: FLUME-2126-0.patch
>
>
> I have found a bug in the elasticsearch sink, the problem is in the 
> {{ContentBuilderUtil.addComplexField}} method, when it does 
> {{builder.field(fieldName, tmp);}} the {{tmp}} object is taken as {{Object}} 
> with the result of being serialized with the {{toString}} method in the 
> {{XContentBuilder}}. In the end you get the object reference as content.
> The following change workaround the problem for me, the bad point is that it 
> has to parse the content twice, I guess there is a better way to solve the 
> problem but I am not an elasticsearch api expert. 
> {code}
> --- 
> a/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ContentBuilderUtil.java
> +++ 
> b/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ContentBuilderUtil.java
> @@ -61,7 +61,12 @@ public class ContentBuilderUtil {
>parser = XContentFactory.xContent(contentType).createParser(data);
>parser.nextToken();
>tmp.copyCurrentStructure(parser);
> -  builder.field(fieldName, tmp);
> +
> +  // if it is a valid structure then we include it
> +  parser = XContentFactory.xContent(contentType).createParser(data);
> +  parser.nextToken();
> +  builder.field(fieldName);
> +  builder.copyCurrentStructure(parser);
>  } catch (JsonParseException ex) {
>// If we get an exception here the most likely cause is nested JSON 
> that
>// can't be figured out in the body. At this point just push it through
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2126) Problem in elasticsearch sink when the event body is a complex field

2014-10-14 Thread Hari Shreedharan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171769#comment-14171769
 ] 

Hari Shreedharan commented on FLUME-2126:
-

+1. Committing this

> Problem in elasticsearch sink when the event body is a complex field
> 
>
> Key: FLUME-2126
> URL: https://issues.apache.org/jira/browse/FLUME-2126
> Project: Flume
>  Issue Type: Bug
>  Components: Sinks+Sources
> Environment: 1.3.1 and 1.4
>Reporter: Massimo Paladin
>Assignee: Ashish Paliwal
> Attachments: FLUME-2126-0.patch
>
>
> I have found a bug in the elasticsearch sink, the problem is in the 
> {{ContentBuilderUtil.addComplexField}} method, when it does 
> {{builder.field(fieldName, tmp);}} the {{tmp}} object is taken as {{Object}} 
> with the result of being serialized with the {{toString}} method in the 
> {{XContentBuilder}}. In the end you get the object reference as content.
> The following change workaround the problem for me, the bad point is that it 
> has to parse the content twice, I guess there is a better way to solve the 
> problem but I am not an elasticsearch api expert. 
> {code}
> --- 
> a/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ContentBuilderUtil.java
> +++ 
> b/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ContentBuilderUtil.java
> @@ -61,7 +61,12 @@ public class ContentBuilderUtil {
>parser = XContentFactory.xContent(contentType).createParser(data);
>parser.nextToken();
>tmp.copyCurrentStructure(parser);
> -  builder.field(fieldName, tmp);
> +
> +  // if it is a valid structure then we include it
> +  parser = XContentFactory.xContent(contentType).createParser(data);
> +  parser.nextToken();
> +  builder.field(fieldName);
> +  builder.copyCurrentStructure(parser);
>  } catch (JsonParseException ex) {
>// If we get an exception here the most likely cause is nested JSON 
> that
>// can't be figured out in the body. At this point just push it through
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 18544: Hive Streaming sink

2014-10-14 Thread Brock Noland

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18544/#review56592
---


Hi Roshan,

It looks good and I think we should commit this soon. There are just a couple 
things we should resolve. There is a bunch of code which is commented out. We 
should remove that. Also there is a bunch of lint/whitepsace issues which we 
should resolve. Can you go throught the patch and find any if(x==y) and make it 
if (x == y)


flume-ng-legacy-sources/flume-thrift-source/src/main/java/com/cloudera/flume/handlers/thrift/ThriftFlumeEventServer.java


are these changes due to the rev in the thrift version?



flume-ng-sinks/flume-hive-sink/src/main/java/org/apache/flume/sink/hive/HiveDelimitedTextSerializer.java


extra line



flume-ng-sinks/flume-hive-sink/src/main/java/org/apache/flume/sink/hive/HiveDelimitedTextSerializer.java


extra space



flume-ng-sinks/flume-hive-sink/src/main/java/org/apache/flume/sink/hive/HiveDelimitedTextSerializer.java


spaces



flume-ng-sinks/flume-hive-sink/src/main/java/org/apache/flume/sink/hive/HiveSink.java


map on LHS



flume-ng-sinks/flume-hive-sink/src/main/java/org/apache/flume/sink/hive/HiveSink.java


why is this here and then the same thing a few lines below not commented



flume-ng-sinks/flume-hive-sink/src/main/java/org/apache/flume/sink/hive/HiveSink.java


why is heartBeatTimer  not private



flume-ng-sinks/flume-hive-sink/src/main/java/org/apache/flume/sink/hive/HiveSink.java


does this really need hashmap on LHS? Sould be map



flume-ng-sinks/flume-hive-sink/src/main/java/org/apache/flume/sink/hive/HiveSink.java


this

 context.getString(Config.SERIALIZER);
 
 should be
 
  context.getString(Config.SERIALIZER, "").trim();
  
 and then you can check for s.isEmpty() in the next line and not worry 
about nulls.



flume-ng-sinks/flume-hive-sink/src/main/java/org/apache/flume/sink/hive/HiveSink.java


Why do we have HashMap on LHS, it should be Map



flume-ng-sinks/flume-hive-sink/src/main/java/org/apache/flume/sink/hive/HiveWriter.java


All of these classes below should end with Exception



flume-ng-sinks/flume-hive-sink/src/test/java/org/apache/flume/sink/hive/TestHiveWriter.java


remove commented code



flume-ng-sinks/flume-hive-sink/src/test/java/org/apache/flume/sink/hive/TestUtil.java


We should log exceptions like this.



flume-ng-sinks/flume-hive-sink/src/test/java/org/apache/flume/sink/hive/TestUtil.java


missing spaces
if(partKeys.size()!=partVals.size()) 
should be:
if (partKeys.size() != partVals.size())



flume-ng-sinks/flume-hive-sink/src/test/java/org/apache/flume/sink/hive/TestUtil.java


We should remove commented code


- Brock Noland


On Sept. 3, 2014, 3:19 a.m., Roshan Naik wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18544/
> ---
> 
> (Updated Sept. 3, 2014, 3:19 a.m.)
> 
> 
> Review request for Flume.
> 
> 
> Bugs: FLUME-1734
> https://issues.apache.org/jira/browse/FLUME-1734
> 
> 
> Repository: flume-git
> 
> 
> Description
> ---
> 
> Hive streaming sink.
> 
> 
> Diffs
> -
> 
>   bin/flume-ng e09e26b 
>   conf/log4j.properties 3918511 
>   
> flume-ng-configuration/src/main/java/org/apache/flume/conf/sink/SinkConfiguration.java
>  ac11558 
>   
> flume-ng-configuration/src/main/java/org/apache/flume/conf/sink/SinkType.java 
> 0a1cd7a 
>   flume-ng-dist/pom.xml 8c18af6 
>   flume-ng-doc/sphinx/FlumeUserGuide.rst a718fbf 
>   
> flume-ng-legacy-sources/flume-thrift-source/src/main/java/com/cloudera/flume/handlers/thrift/ThriftFlumeEventServer.java
>  ff32c45 
>   
> flume-ng-legacy-sources/flume-thrift-source/src/test/java/org/apache/flume/source/thriftLegacy/TestThriftLegacySource.java
>  8e08f22 
>   
> flume-ng-sdk/src/main/java/org/apache/flume/thrift/ThriftSourceProtocol.java 
> 7f966b0 
>   flume-ng-sinks/flume-hive-sink/pom.xml PRE-CREATION 
>   
> flume-n

[jira] [Commented] (FLUME-2497) TCP and UDP syslog sources parsing the timestamp incorrectly

2014-10-14 Thread Johny Rufus (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171642#comment-14171642
 ] 

Johny Rufus commented on FLUME-2497:


Reference to the javadoc for SimpleDateFormat, 
http://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html
which specifies that 'S' corresponds to milliseconds and the standard format 
that matches the above pattern for parsing is
-MM-dd'T'HH:mm:ss.S and in which case S = 222357 is taken as milliseconds

> TCP and UDP syslog sources parsing the timestamp incorrectly
> 
>
> Key: FLUME-2497
> URL: https://issues.apache.org/jira/browse/FLUME-2497
> Project: Flume
>  Issue Type: Bug
>Affects Versions: v1.5.0.1
>Reporter: Johny Rufus
>Assignee: Johny Rufus
> Fix For: v1.6.0
>
> Attachments: FLUME-2497.patch
>
>
> TCP and UDP syslog sources parse the timestamp incorrectly while using 
> Syslogutils extractEvent and buildEvent  methods.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (FLUME-2500) Add a channel that uses Kafka

2014-10-14 Thread Hari Shreedharan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171466#comment-14171466
 ] 

Hari Shreedharan edited comment on FLUME-2500 at 10/14/14 8:24 PM:
---

Roshan:

There are 2 motivations for this channel:

- Be able to provide a distributed channel for Flume - we could implement one 
internally in Flume - which I would definitely prefer but don't have the cycles 
to do, and is better operations-wise or use an existing service (like Hazel 
cast used by Ashish or Kafka in this case)
- Given that a user has a Kafka cluster, be able to use Flume's sources and 
sinks to either get data into Kafka from the variety of Flume sources, or write 
to HDFS from Kafka using Flume's sinks. The reason a channel works better than 
the Kafka source is that a dead flume agent won't affect event delivery - the 
events just get routed via another agent.


was (Author: hshreedharan):
Roshan:

There are 2 motivations for this channel:

#. Be able to provide a distributed channel for Flume - we could implement one 
internally in Flume - which I would definitely prefer but don't have the cycles 
to do, and is better operations-wise or use an existing service (like Hazel 
cast used by Ashish or Kafka in this case)
#. Given that a user has a Kafka cluster, be able to use Flume's sources and 
sinks to either get data into Kafka from the variety of Flume sources, or write 
to HDFS from Kafka using Flume's sinks. The reason a channel works better than 
the Kafka source is that a dead flume agent won't affect event delivery - the 
events just get routed via another agent.

> Add a channel that uses Kafka 
> --
>
> Key: FLUME-2500
> URL: https://issues.apache.org/jira/browse/FLUME-2500
> Project: Flume
>  Issue Type: Bug
>Reporter: Hari Shreedharan
>Assignee: Hari Shreedharan
>
> Here is the rationale:
> - Kafka does give a HA channel, which means a dead agent does not affect the 
> data in the channel - thus reducing delay of delivery.
> - Kafka is used by many companies - it would be a good idea to use Flume to 
> pull data from Kafka and write it to HDFS/HBase etc. 
> This channel is not going to be useful for cases where Kafka is not already 
> used, since it brings is operational overhead of maintaining two systems, but 
> if there is Kafka in use - this is good way to integrate Kafka and Flume.
> Here is an a scratch implementation: 
> https://github.com/harishreedharan/flume/blob/kafka-channel/flume-ng-channels/flume-kafka-channel/src/main/java/org/apache/flume/channel/kafka/KafkaChannel.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2500) Add a channel that uses Kafka

2014-10-14 Thread Hari Shreedharan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171466#comment-14171466
 ] 

Hari Shreedharan commented on FLUME-2500:
-

Roshan:

There are 2 motivations for this channel:

#. Be able to provide a distributed channel for Flume - we could implement one 
internally in Flume - which I would definitely prefer but don't have the cycles 
to do, and is better operations-wise or use an existing service (like Hazel 
cast used by Ashish or Kafka in this case)
#. Given that a user has a Kafka cluster, be able to use Flume's sources and 
sinks to either get data into Kafka from the variety of Flume sources, or write 
to HDFS from Kafka using Flume's sinks. The reason a channel works better than 
the Kafka source is that a dead flume agent won't affect event delivery - the 
events just get routed via another agent.

> Add a channel that uses Kafka 
> --
>
> Key: FLUME-2500
> URL: https://issues.apache.org/jira/browse/FLUME-2500
> Project: Flume
>  Issue Type: Bug
>Reporter: Hari Shreedharan
>Assignee: Hari Shreedharan
>
> Here is the rationale:
> - Kafka does give a HA channel, which means a dead agent does not affect the 
> data in the channel - thus reducing delay of delivery.
> - Kafka is used by many companies - it would be a good idea to use Flume to 
> pull data from Kafka and write it to HDFS/HBase etc. 
> This channel is not going to be useful for cases where Kafka is not already 
> used, since it brings is operational overhead of maintaining two systems, but 
> if there is Kafka in use - this is good way to integrate Kafka and Flume.
> Here is an a scratch implementation: 
> https://github.com/harishreedharan/flume/blob/kafka-channel/flume-ng-channels/flume-kafka-channel/src/main/java/org/apache/flume/channel/kafka/KafkaChannel.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (FLUME-2500) Add a channel that uses Kafka

2014-10-14 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171444#comment-14171444
 ] 

Roshan Naik edited comment on FLUME-2500 at 10/14/14 8:10 PM:
--

My thoughts ...
- It seems better for clients to write directly to Kafka and bypass flume all 
together in such a case. 
- Data flow seems unnecessarily complex ... data gets pushed out to a remote 
service when going from source -> kafka channel, then bought back to the local 
host when event flows from channel -> sink .. It seems a simpler data flow 
would be something like   client -> kafka  (via local flume agent usinga 
kafka sink) and then  some subscriber which pull from Kafka
- Kafka being a remote service, both flume sources & sinks will get coupled to 
intermittent failures when communicating with Kafka (sort of like the jdbc 
channel).


was (Author: roshan_naik):
My thoughts ...
- It seems better for clients to write directly to Kafka and bypass flume all 
together in such a case. 
- Data flow seems unnecessarily complex ... data gets pushed out to a remote 
service when going from source -> kafka channel, then bought back to the local 
host when event flows from channel -> sink .. It seems better for data flow to 
be something like   client -> kafka  (via local flume agent usinga kafka 
sink) and then  some subscriber which pull from Kafka
- Kafka being a remote service, both flume sources & sinks will get coupled to 
intermittent failures when communicating with Kafka (sort of like the jdbc 
channel).

> Add a channel that uses Kafka 
> --
>
> Key: FLUME-2500
> URL: https://issues.apache.org/jira/browse/FLUME-2500
> Project: Flume
>  Issue Type: Bug
>Reporter: Hari Shreedharan
>Assignee: Hari Shreedharan
>
> Here is the rationale:
> - Kafka does give a HA channel, which means a dead agent does not affect the 
> data in the channel - thus reducing delay of delivery.
> - Kafka is used by many companies - it would be a good idea to use Flume to 
> pull data from Kafka and write it to HDFS/HBase etc. 
> This channel is not going to be useful for cases where Kafka is not already 
> used, since it brings is operational overhead of maintaining two systems, but 
> if there is Kafka in use - this is good way to integrate Kafka and Flume.
> Here is an a scratch implementation: 
> https://github.com/harishreedharan/flume/blob/kafka-channel/flume-ng-channels/flume-kafka-channel/src/main/java/org/apache/flume/channel/kafka/KafkaChannel.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (FLUME-2500) Add a channel that uses Kafka

2014-10-14 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171444#comment-14171444
 ] 

Roshan Naik edited comment on FLUME-2500 at 10/14/14 8:11 PM:
--

My thoughts ...
- It seems better for clients to write directly to Kafka and bypass flume kaka 
channel all together in such a case. 
- Data flow seems unnecessarily complex ... data gets pushed out to a remote 
service when going from source -> kafka channel, then bought back to the local 
host when event flows from channel -> sink .. It seems a simpler data flow 
would be something like   client -> kafka  (via local flume agent usinga 
kafka sink) and then  some subscriber which pull from Kafka
- Kafka being a remote service, both flume sources & sinks will get coupled to 
intermittent failures when communicating with Kafka (sort of like the jdbc 
channel).


was (Author: roshan_naik):
My thoughts ...
- It seems better for clients to write directly to Kafka and bypass flume all 
together in such a case. 
- Data flow seems unnecessarily complex ... data gets pushed out to a remote 
service when going from source -> kafka channel, then bought back to the local 
host when event flows from channel -> sink .. It seems a simpler data flow 
would be something like   client -> kafka  (via local flume agent usinga 
kafka sink) and then  some subscriber which pull from Kafka
- Kafka being a remote service, both flume sources & sinks will get coupled to 
intermittent failures when communicating with Kafka (sort of like the jdbc 
channel).

> Add a channel that uses Kafka 
> --
>
> Key: FLUME-2500
> URL: https://issues.apache.org/jira/browse/FLUME-2500
> Project: Flume
>  Issue Type: Bug
>Reporter: Hari Shreedharan
>Assignee: Hari Shreedharan
>
> Here is the rationale:
> - Kafka does give a HA channel, which means a dead agent does not affect the 
> data in the channel - thus reducing delay of delivery.
> - Kafka is used by many companies - it would be a good idea to use Flume to 
> pull data from Kafka and write it to HDFS/HBase etc. 
> This channel is not going to be useful for cases where Kafka is not already 
> used, since it brings is operational overhead of maintaining two systems, but 
> if there is Kafka in use - this is good way to integrate Kafka and Flume.
> Here is an a scratch implementation: 
> https://github.com/harishreedharan/flume/blob/kafka-channel/flume-ng-channels/flume-kafka-channel/src/main/java/org/apache/flume/channel/kafka/KafkaChannel.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2500) Add a channel that uses Kafka

2014-10-14 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171444#comment-14171444
 ] 

Roshan Naik commented on FLUME-2500:


My thoughts ...
- It seems better for clients to write directly to Kafka and bypass flume all 
together in such a case. 
- Data flow seems unnecessarily complex ... data gets pushed out to a remote 
service when going from source -> kafka channel, then bought back to the local 
host when event flows from channel -> sink .. It seems better for data flow to 
be something like   client -> kafka  (via local flume agent usinga kafka 
sink) and then  some subscriber which pull from Kafka
- Kafka being a remote service, both flume sources & sinks will get coupled to 
intermittent failures when communicating with Kafka (sort of like the jdbc 
channel).

> Add a channel that uses Kafka 
> --
>
> Key: FLUME-2500
> URL: https://issues.apache.org/jira/browse/FLUME-2500
> Project: Flume
>  Issue Type: Bug
>Reporter: Hari Shreedharan
>Assignee: Hari Shreedharan
>
> Here is the rationale:
> - Kafka does give a HA channel, which means a dead agent does not affect the 
> data in the channel - thus reducing delay of delivery.
> - Kafka is used by many companies - it would be a good idea to use Flume to 
> pull data from Kafka and write it to HDFS/HBase etc. 
> This channel is not going to be useful for cases where Kafka is not already 
> used, since it brings is operational overhead of maintaining two systems, but 
> if there is Kafka in use - this is good way to integrate Kafka and Flume.
> Here is an a scratch implementation: 
> https://github.com/harishreedharan/flume/blob/kafka-channel/flume-ng-channels/flume-kafka-channel/src/main/java/org/apache/flume/channel/kafka/KafkaChannel.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-1585) Document the load balancing sink processor

2014-10-14 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171423#comment-14171423
 ] 

Roshan Naik commented on FLUME-1585:


It seems the priority & penalty options are lacking proper explanation.

> Document the load balancing sink processor
> --
>
> Key: FLUME-1585
> URL: https://issues.apache.org/jira/browse/FLUME-1585
> Project: Flume
>  Issue Type: Documentation
>Reporter: Mike Percy
>Assignee: Ashish Paliwal
>
> We need to document the load balancing sink processor, including backoff 
> options.
> Example config:
> {noformat}
> # test file
> agent.channels = ch-0
> agent.sources = src-0
> agent.sinks = sink-0 sink-1 sink-2
> agent.sinkgroups = group-0
> agent.channels.ch-0.type = memory
> agent.channels.ch-0.capacity = 1
> agent.sources.src-0.type = netcat
> agent.sources.src-0.channels = ch-0
> agent.sources.src-0.bind = 0.0.0.0
> agent.sources.src-0.port = 10002
> agent.sinkgroups.group-0.sinks = sink-0 sink-1 sink-2
> agent.sinkgroups.group-0.processor.type = load_balance
> agent.sinkgroups.group-0.processor.selector = round_robin
> agent.sinkgroups.group-0.processor.backoff = true
> agent.sinks.sink-0.type = avro
> agent.sinks.sink-0.channel = ch-0
> agent.sinks.sink-0.hostname = 127.0.0.1
> agent.sinks.sink-0.port = 999
> agent.sinks.sink-1.type = avro
> agent.sinks.sink-1.channel = ch-0
> agent.sinks.sink-1.hostname = 127.0.0.1
> agent.sinks.sink-1.port = 999
> agent.sinks.sink-2.type = logger
> agent.sinks.sink-2.channel = ch-0
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLUME-2502) Spool source's directory listing is inefficient

2014-10-14 Thread Prateek Rungta (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prateek Rungta updated FLUME-2502:
--
Attachment: FLUME-2502-1.patch

Adding new patch which consolidates "BATCH" and "RANDOM" consumption orders. 

Re your comments:
- The new code uses the same filter as the existing code base for retrieving 
file listing so no issue there.

- If the File does not exist after it has been listed from the directory:
The code base to handle it is unchanged from what was done earlier. My 
understanding is the `openFile` method will handle the errors as you indicated 
- i.e. log an error and move on. 

> Spool source's directory listing is inefficient
> ---
>
> Key: FLUME-2502
> URL: https://issues.apache.org/jira/browse/FLUME-2502
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Affects Versions: v1.5.0
>Reporter: Prateek Rungta
> Attachments: FLUME-2502-0.patch, FLUME-2502-1.patch
>
>
> As mentioned in 
> [FLUME-2309|https://issues.apache.org/jira/browse/FLUME-2309], the directory 
> listing can it self become the bottleneck when accessing directories with a 
> large number of files (>1M). The fix in that JIRA added in the ability to 
> specify `RANDOM` as a Consume-Order to avoid sorting large lists.
> The slowness of the directory listing is still un-addressed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2502) Spool source's directory listing is inefficient

2014-10-14 Thread Prateek Rungta (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171330#comment-14171330
 ] 

Prateek Rungta commented on FLUME-2502:
---

[~hshreedharan] cool! I debated putting it in the same place as "RANDOM". The 
only reason I wanted to put it in a separate section was: it changes the amount 
of memory used by the Agent. I haven't profiled it to know how much memory it 
now uses, and I didn't want to affect existing users of "RANDOM". But I'm not a 
fan of the new consume order either. 

I'll defer to your judgement on which is the lesser of the two evils :)

> Spool source's directory listing is inefficient
> ---
>
> Key: FLUME-2502
> URL: https://issues.apache.org/jira/browse/FLUME-2502
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Affects Versions: v1.5.0
>Reporter: Prateek Rungta
> Attachments: FLUME-2502-0.patch
>
>
> As mentioned in 
> [FLUME-2309|https://issues.apache.org/jira/browse/FLUME-2309], the directory 
> listing can it self become the bottleneck when accessing directories with a 
> large number of files (>1M). The fix in that JIRA added in the ability to 
> specify `RANDOM` as a Consume-Order to avoid sorting large lists.
> The slowness of the directory listing is still un-addressed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2502) Spool source's directory listing is inefficient

2014-10-14 Thread Prateek Rungta (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171321#comment-14171321
 ] 

Prateek Rungta commented on FLUME-2502:
---

[~hshreedharan] I'm not familiar with that API, but it looks like exactly 
what's need in the long run. In the interim, can we apply this patch? 

Without applying the patch, the Spool Source is unusable for large directories. 
And this is exacerbated when using the BlobDeserializer.

> Spool source's directory listing is inefficient
> ---
>
> Key: FLUME-2502
> URL: https://issues.apache.org/jira/browse/FLUME-2502
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Affects Versions: v1.5.0
>Reporter: Prateek Rungta
> Attachments: FLUME-2502-0.patch
>
>
> As mentioned in 
> [FLUME-2309|https://issues.apache.org/jira/browse/FLUME-2309], the directory 
> listing can it self become the bottleneck when accessing directories with a 
> large number of files (>1M). The fix in that JIRA added in the ability to 
> specify `RANDOM` as a Consume-Order to avoid sorting large lists.
> The slowness of the directory listing is still un-addressed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2502) Spool source's directory listing is inefficient

2014-10-14 Thread Hari Shreedharan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171319#comment-14171319
 ] 

Hari Shreedharan commented on FLUME-2502:
-

Thanks for the patch! Instead of adding a separate config param for this - lets 
make this the default behavior for random. Just ensure that you have all edge 
cases covered like a listed file is not available anymore - log an error and 
move on to the next file. Also ensure that the original filter is respected.

> Spool source's directory listing is inefficient
> ---
>
> Key: FLUME-2502
> URL: https://issues.apache.org/jira/browse/FLUME-2502
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Affects Versions: v1.5.0
>Reporter: Prateek Rungta
> Attachments: FLUME-2502-0.patch
>
>
> As mentioned in 
> [FLUME-2309|https://issues.apache.org/jira/browse/FLUME-2309], the directory 
> listing can it self become the bottleneck when accessing directories with a 
> large number of files (>1M). The fix in that JIRA added in the ability to 
> specify `RANDOM` as a Consume-Order to avoid sorting large lists.
> The slowness of the directory listing is still un-addressed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2502) Spool source's directory listing is inefficient

2014-10-14 Thread Hari Shreedharan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171307#comment-14171307
 ] 

Hari Shreedharan commented on FLUME-2502:
-

Once we support Java 7+ only, we can use the WatchService to make sure we don't 
do a file listing each time, instead keep track of the files as they get added.



> Spool source's directory listing is inefficient
> ---
>
> Key: FLUME-2502
> URL: https://issues.apache.org/jira/browse/FLUME-2502
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Affects Versions: v1.5.0
>Reporter: Prateek Rungta
>
> As mentioned in 
> [FLUME-2309|https://issues.apache.org/jira/browse/FLUME-2309], the directory 
> listing can it self become the bottleneck when accessing directories with a 
> large number of files (>1M). The fix in that JIRA added in the ability to 
> specify `RANDOM` as a Consume-Order to avoid sorting large lists.
> The slowness of the directory listing is still un-addressed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLUME-2502) Spool source's directory listing is inefficient

2014-10-14 Thread Prateek Rungta (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prateek Rungta updated FLUME-2502:
--
Attachment: FLUME-2502-0.patch

Initial Attempt with "BATCH" Consume Order 

> Spool source's directory listing is inefficient
> ---
>
> Key: FLUME-2502
> URL: https://issues.apache.org/jira/browse/FLUME-2502
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Affects Versions: v1.5.0
>Reporter: Prateek Rungta
> Attachments: FLUME-2502-0.patch
>
>
> As mentioned in 
> [FLUME-2309|https://issues.apache.org/jira/browse/FLUME-2309], the directory 
> listing can it self become the bottleneck when accessing directories with a 
> large number of files (>1M). The fix in that JIRA added in the ability to 
> specify `RANDOM` as a Consume-Order to avoid sorting large lists.
> The slowness of the directory listing is still un-addressed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2502) Spool source's directory listing is inefficient

2014-10-14 Thread Prateek Rungta (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171302#comment-14171302
 ] 

Prateek Rungta commented on FLUME-2502:
---

I found that by saving the directory listing across runs, I was able to get a 
10x performance boost in the rate of consumption vs the `RANDOM` consume order. 
Also, this rate of consumption was more stable as the directory size grows. 

I've attached a patch with the changes in a new consume order: "BATCH". 


> Spool source's directory listing is inefficient
> ---
>
> Key: FLUME-2502
> URL: https://issues.apache.org/jira/browse/FLUME-2502
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Affects Versions: v1.5.0
>Reporter: Prateek Rungta
>
> As mentioned in 
> [FLUME-2309|https://issues.apache.org/jira/browse/FLUME-2309], the directory 
> listing can it self become the bottleneck when accessing directories with a 
> large number of files (>1M). The fix in that JIRA added in the ability to 
> specify `RANDOM` as a Consume-Order to avoid sorting large lists.
> The slowness of the directory listing is still un-addressed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLUME-2502) Spool source's directory listing is inefficient

2014-10-14 Thread Prateek Rungta (JIRA)
Prateek Rungta created FLUME-2502:
-

 Summary: Spool source's directory listing is inefficient
 Key: FLUME-2502
 URL: https://issues.apache.org/jira/browse/FLUME-2502
 Project: Flume
  Issue Type: Improvement
  Components: Sinks+Sources
Affects Versions: v1.5.0
Reporter: Prateek Rungta


As mentioned in [FLUME-2309|https://issues.apache.org/jira/browse/FLUME-2309], 
the directory listing can it self become the bottleneck when accessing 
directories with a large number of files (>1M). The fix in that JIRA added in 
the ability to specify `RANDOM` as a Consume-Order to avoid sorting large lists.

The slowness of the directory listing is still un-addressed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2014-10-14 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171202#comment-14171202
 ] 

Otis Gospodnetic commented on FLUME-2498:
-

Thanks!
Questions:
* what about Windows support?  Should that be another row?
* what does Pollable mean?
* what does Append header mean/do?  When is this used, how/why is it useful?


> Implement Taildir Source
> 
>
> Key: FLUME-2498
> URL: https://issues.apache.org/jira/browse/FLUME-2498
> Project: Flume
>  Issue Type: New Feature
>  Components: Sinks+Sources
>Reporter: Satoshi Iijima
> Attachments: FLUME-2498.patch
>
>
> This is the proposal of implementing a new tailing source.
> This source watches the specified files, and tails them in nearly real-time 
> once appends are detected to these files.
> * This source is reliable and will not miss data even when the tailing files 
> rotate.
> * It periodically writes the last read position of each file in a position 
> file using the JSON format.
> * If Flume is stopped or down for some reason, it can restart tailing from 
> the position written on the existing position file.
> * It can add event headers to each tailing file group. 
> A attached patch includes a config documentation of this.
> This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2482) Race condition in File Channels' Log.removeOldLogs

2014-10-14 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171016#comment-14171016
 ] 

Santiago M. Mola commented on FLUME-2482:
-

[~jrufus] Sounds good.

> Race condition in File Channels' Log.removeOldLogs
> --
>
> Key: FLUME-2482
> URL: https://issues.apache.org/jira/browse/FLUME-2482
> Project: Flume
>  Issue Type: Bug
>  Components: File Channel
>Affects Versions: v1.5.0.1
>Reporter: Santiago M. Mola
>  Labels: race-condition
> Fix For: v1.6.0
>
> Attachments: FLUME-2482.patch
>
>
> TestFileChannelRestart.testToggleCheckpointCompressionFromFalseToTrue 
> sometimes produces a race condition at Log.removeOldLogs.
> https://travis-ci.org/Stratio/flume/jobs/36782318#L6193
> testToggleCheckpointCompressionFromFalseToTrue(org.apache.flume.channel.file.TestFileChannelRestart)
>  Time elapsed: 144362 sec <<< ERROR!
> java.util.ConcurrentModificationException
> at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:859)
> at java.util.ArrayList$Itr.next(ArrayList.java:831)
> at org.apache.flume.channel.file.Log.removeOldLogs(Log.java:1070)
> at org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:1055)
> at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.fest.reflect.method.Invoker.invoke(Invoker.java:110)
> at org.apache.flume.channel.file.TestUtils.forceCheckpoint(TestUtils.java:134)
> at 
> org.apache.flume.channel.file.TestFileChannelRestart.restartToggleCompression(TestFileChannelRestart.java:930)
> at 
> org.apache.flume.channel.file.TestFileChannelRestart.testToggleCheckpointCompressionFromFalseToTrue(TestFileChannelRestart.java:896)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
> at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
> at 
> org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
> at 
> org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
> at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2014-10-14 Thread Satoshi Iijima (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14170734#comment-14170734
 ] 

Satoshi Iijima commented on FLUME-2498:
---

I compared this source with other source components which can tail or read 
file(s)
If there are incorrect contents, please let me know.

Compare with tail-pollable-source (FLUME-2344) and jambalay-file-source
http://vschart.com/compare/flume-taildir-source/vs/flume-tail-pollable-source-flume-2344/vs/flume-jambalay-file-source
 
Compare with Spooling directory source and Exec source (tail -F)
http://vschart.com/compare/flume-taildir-source/vs/flume-spooling-directory/vs/flume-exec-source-tail-f

> Implement Taildir Source
> 
>
> Key: FLUME-2498
> URL: https://issues.apache.org/jira/browse/FLUME-2498
> Project: Flume
>  Issue Type: New Feature
>  Components: Sinks+Sources
>Reporter: Satoshi Iijima
> Attachments: FLUME-2498.patch
>
>
> This is the proposal of implementing a new tailing source.
> This source watches the specified files, and tails them in nearly real-time 
> once appends are detected to these files.
> * This source is reliable and will not miss data even when the tailing files 
> rotate.
> * It periodically writes the last read position of each file in a position 
> file using the JSON format.
> * If Flume is stopped or down for some reason, it can restart tailing from 
> the position written on the existing position file.
> * It can add event headers to each tailing file group. 
> A attached patch includes a config documentation of this.
> This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2197) Memory Channel has GC issues

2014-10-14 Thread Ashish Paliwal (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14170662#comment-14170662
 ] 

Ashish Paliwal commented on FLUME-2197:
---

can you please share following
1. Input event rate at Source (Number of events/sec received a Source)
2. Sink drain rate (Number of events drained at Sink)
3. Channel Size
4. Event size

> Memory Channel has GC issues
> 
>
> Key: FLUME-2197
> URL: https://issues.apache.org/jira/browse/FLUME-2197
> Project: Flume
>  Issue Type: Bug
>Reporter: Hari Shreedharan
>Assignee: Roshan Naik
> Attachments: HA_result.jpg, mem ch - mem alloc.png, spill ch - mem 
> alloc.png
>
>
> Due to the fact that we use a LinkedBlockingDeque as the backing queue for 
> the MemoryChannel, we end up hitting GC issues more often than we should.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2197) Memory Channel has GC issues

2014-10-14 Thread li xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14170620#comment-14170620
 ] 

li xiang commented on FLUME-2197:
-

Hi Roshan, no, I got the out-of-mem when Xmx=6G, Xms=2G. 

> Memory Channel has GC issues
> 
>
> Key: FLUME-2197
> URL: https://issues.apache.org/jira/browse/FLUME-2197
> Project: Flume
>  Issue Type: Bug
>Reporter: Hari Shreedharan
>Assignee: Roshan Naik
> Attachments: HA_result.jpg, mem ch - mem alloc.png, spill ch - mem 
> alloc.png
>
>
> Due to the fact that we use a LinkedBlockingDeque as the backing queue for 
> the MemoryChannel, we end up hitting GC issues more often than we should.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLUME-2246) event body data size can make it configurable for logger sinker

2014-10-14 Thread Ashish Paliwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Paliwal updated FLUME-2246:
--
Attachment: FLUME-2246-1.patch

Updated patch with review comments, minus the assert comment. Not sure how to 
add assert by intercepting logged data. Verified manually in test logs

> event body data size can make it configurable for logger sinker
> ---
>
> Key: FLUME-2246
> URL: https://issues.apache.org/jira/browse/FLUME-2246
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>Reporter: Gopinathan A
>Assignee: Ashish Paliwal
>Priority: Minor
> Attachments: FLUME-2246-0.patch, FLUME-2246-1.patch
>
>
> Currently logger sinker will dump only 16 bytes of data into log file, better 
> to make it configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)