[jira] [Commented] (FLUME-2245) HDFS files with errors unable to close

2014-05-14 Thread Hari Shreedharan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997133#comment-13997133
 ] 

Hari Shreedharan commented on FLUME-2245:
-

[~ juhanic] - Do you want to just do that one? If yes, please submit a new 
patch - I will commit it.

> HDFS files with errors unable to close
> --
>
> Key: FLUME-2245
> URL: https://issues.apache.org/jira/browse/FLUME-2245
> Project: Flume
>  Issue Type: Bug
>Reporter: Juhani Connolly
> Attachments: flume.log.1133, flume.log.file
>
>
> This  is running on a snapshot of Flume-1.5 with the git hash 
> 99db32ccd163daf9d7685f0e8485941701e1133d
> When a datanode goes unresponsive for a significant amount of time(for 
> example a big gc) an append failure will occur followed by repeated time outs 
> appearing in the log, and failure to close the stream. Relevant section of 
> logs attached(where it first starts appearing.
> The same log repeats periodically, consistently running into a 
> TimeoutException.
> Restarting  flume(or presumably just the HDFSSink) solves the issue.
> Probable cause in comments



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (FLUME-2126) Problem in elasticsearch sink when the event body is a complex field

2014-05-14 Thread Deepak Subhramanian (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13992844#comment-13992844
 ] 

Deepak Subhramanian commented on FLUME-2126:


I am having the same problem while posting json data. Thinking about using the 
temporary fix suggested. Any plans for fixing it in future versions. 

> Problem in elasticsearch sink when the event body is a complex field
> 
>
> Key: FLUME-2126
> URL: https://issues.apache.org/jira/browse/FLUME-2126
> Project: Flume
>  Issue Type: Bug
>  Components: Sinks+Sources
> Environment: 1.3.1 and 1.4
>Reporter: Massimo Paladin
>Assignee: Ashish Paliwal
>
> I have found a bug in the elasticsearch sink, the problem is in the 
> {{ContentBuilderUtil.addComplexField}} method, when it does 
> {{builder.field(fieldName, tmp);}} the {{tmp}} object is taken as {{Object}} 
> with the result of being serialized with the {{toString}} method in the 
> {{XContentBuilder}}. In the end you get the object reference as content.
> The following change workaround the problem for me, the bad point is that it 
> has to parse the content twice, I guess there is a better way to solve the 
> problem but I am not an elasticsearch api expert. 
> {code}
> --- 
> a/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ContentBuilderUtil.java
> +++ 
> b/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ContentBuilderUtil.java
> @@ -61,7 +61,12 @@ public class ContentBuilderUtil {
>parser = XContentFactory.xContent(contentType).createParser(data);
>parser.nextToken();
>tmp.copyCurrentStructure(parser);
> -  builder.field(fieldName, tmp);
> +
> +  // if it is a valid structure then we include it
> +  parser = XContentFactory.xContent(contentType).createParser(data);
> +  parser.nextToken();
> +  builder.field(fieldName);
> +  builder.copyCurrentStructure(parser);
>  } catch (JsonParseException ex) {
>// If we get an exception here the most likely cause is nested JSON 
> that
>// can't be figured out in the body. At this point just push it through
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (FLUME-2245) HDFS files with errors unable to close

2014-05-14 Thread Hari Shreedharan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997839#comment-13997839
 ] 

Hari Shreedharan commented on FLUME-2245:
-

I am going to give both of you credit in the commit message since both your 
patches made sense :-)

> HDFS files with errors unable to close
> --
>
> Key: FLUME-2245
> URL: https://issues.apache.org/jira/browse/FLUME-2245
> Project: Flume
>  Issue Type: Bug
>Reporter: Juhani Connolly
> Attachments: FLUME-2245.patch, flume.log.1133, flume.log.file
>
>
> This  is running on a snapshot of Flume-1.5 with the git hash 
> 99db32ccd163daf9d7685f0e8485941701e1133d
> When a datanode goes unresponsive for a significant amount of time(for 
> example a big gc) an append failure will occur followed by repeated time outs 
> appearing in the log, and failure to close the stream. Relevant section of 
> logs attached(where it first starts appearing.
> The same log repeats periodically, consistently running into a 
> TimeoutException.
> Restarting  flume(or presumably just the HDFSSink) solves the issue.
> Probable cause in comments



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (FLUME-2383) Add option to enable agent exit if a component fails to start

2014-05-14 Thread Brock Noland (JIRA)
Brock Noland created FLUME-2383:
---

 Summary: Add option to enable agent exit if a component fails to 
start
 Key: FLUME-2383
 URL: https://issues.apache.org/jira/browse/FLUME-2383
 Project: Flume
  Issue Type: Improvement
Reporter: Brock Noland


Some users do not like that our agent continues to run despite a failed 
component. We should have an option so users can tell the flume agent to exit 
if any component fails to start.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (FLUME-2245) HDFS files with errors unable to close

2014-05-14 Thread Hari Shreedharan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997131#comment-13997131
 ] 

Hari Shreedharan commented on FLUME-2245:
-

This patch does not really need the changes in the HDFSDataStream and 
HDFSCompressedStream classes. We should just catch the exception thrown by the 
flush and try to close. If the close fails, it will get rescheduled anyway.

> HDFS files with errors unable to close
> --
>
> Key: FLUME-2245
> URL: https://issues.apache.org/jira/browse/FLUME-2245
> Project: Flume
>  Issue Type: Bug
>Reporter: Juhani Connolly
> Attachments: flume.log.1133, flume.log.file
>
>
> This  is running on a snapshot of Flume-1.5 with the git hash 
> 99db32ccd163daf9d7685f0e8485941701e1133d
> When a datanode goes unresponsive for a significant amount of time(for 
> example a big gc) an append failure will occur followed by repeated time outs 
> appearing in the log, and failure to close the stream. Relevant section of 
> logs attached(where it first starts appearing.
> The same log repeats periodically, consistently running into a 
> TimeoutException.
> Restarting  flume(or presumably just the HDFSSink) solves the issue.
> Probable cause in comments



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (FLUME-2273) ElasticSearchSink: Add handling for header substitution in indexName

2014-05-14 Thread Satoshi Iijima (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997311#comment-13997311
 ] 

Satoshi Iijima commented on FLUME-2273:
---

Hi
Deepak - your error is probably a issue of EventSerializer or elasticsearch.
The indexType of the index has to have the same data structure in elasticsearch.
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_document_metadata.html
Otherwise FLUME-2126 or a new unknown issue.


> ElasticSearchSink: Add handling for header substitution in indexName
> 
>
> Key: FLUME-2273
> URL: https://issues.apache.org/jira/browse/FLUME-2273
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>Reporter: Paul Merry
>Priority: Minor
> Attachments: FLUME-2273.patch, new_FLUME-2273-2.patch, 
> new_FLUME-2273-5.patch, new_FLUME-2273.patch
>
>
> The ElasticSearchSink would be improved by allowing for header substitution 
> in the indexName property.
> A use case is where the sink is an intermediate part of a chain and the index 
> name is required to identify the message origin, at present it can only be a 
> hardcoded value.
> The HDFS sink already supports header substitution so a similar format would 
> maintain consistency. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (FLUME-1734) Create a Hive Sink based on the new Hive Streaming support

2014-05-14 Thread Roshan Naik (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated FLUME-1734:
---

Attachment: FLUME-1734.v1.patch

updating  FLUME-1734.v1.patch 

> Create a Hive Sink based on the new Hive Streaming support
> --
>
> Key: FLUME-1734
> URL: https://issues.apache.org/jira/browse/FLUME-1734
> Project: Flume
>  Issue Type: New Feature
>  Components: Sinks+Sources
>Affects Versions: v1.2.0
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>  Labels: features
> Attachments: FLUME-1734.draft.1.patch, FLUME-1734.draft.2.patch, 
> FLUME-1734.v1.patch, FLUME-1734.v1.patch
>
>
> Create a sink that would stream data into HCatalog partitions. The primary 
> goal being that once the data is loaded into Hadoop, it should be 
> automatically queryable (using say Hive or Pig) without requiring additional 
> post processing steps on behalf of the users. Sink should manage the creation 
> of new partitions and committing them periodically. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Flume incoming logs - previous day

2014-05-14 Thread Divya R
Hi Roshan,

   Can you suggest me any commands or tool through which I can check the
drain rate and http matrix only for flume.?

Thanks and Regards,
-Divya


On Wed, May 14, 2014 at 6:28 AM, Roshan Naik  wrote:

> To triage this.. may want to turn on http metrics and  check how full the
> channel is on that agent. Also calculate the drain rate on that. Contrast
> that with how the other agents are behaving.
> -roshan
>
>
> On Mon, May 12, 2014 at 2:02 AM, Divya R  wrote:
>
> > Hi Guys,
> >
> >   I have been using flume since year and a half. I have flume agents
> > running on around 12 machines all are working fine. Only in one of the
> > client machine I am receiving previous day logs. There is absolutely no
> > change in configuration from other machine and no exceptions as well. I
> am
> > using TimestampInterceptor. Some of the machines have IST, CEST, PST but
> > this machine time is SAST(South Africa standard time). Can this be an
> issue
> > in any regard for flume.?
> >
> >   When I delete flume_channel and restart I receive todays logs but again
> > it gets back to previous day in the following day.
> >
> >Any help in this regards is greatly appreciated.
> >
> > Thanks and Regards,
> > -Divya
> >
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>


[jira] [Commented] (FLUME-2245) HDFS files with errors unable to close

2014-05-14 Thread Hari Shreedharan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997333#comment-13997333
 ] 

Hari Shreedharan commented on FLUME-2245:
-

+1. I will run tests and commit this tomorrow.

> HDFS files with errors unable to close
> --
>
> Key: FLUME-2245
> URL: https://issues.apache.org/jira/browse/FLUME-2245
> Project: Flume
>  Issue Type: Bug
>Reporter: Juhani Connolly
> Attachments: FLUME-2245.patch, flume.log.1133, flume.log.file
>
>
> This  is running on a snapshot of Flume-1.5 with the git hash 
> 99db32ccd163daf9d7685f0e8485941701e1133d
> When a datanode goes unresponsive for a significant amount of time(for 
> example a big gc) an append failure will occur followed by repeated time outs 
> appearing in the log, and failure to close the stream. Relevant section of 
> logs attached(where it first starts appearing.
> The same log repeats periodically, consistently running into a 
> TimeoutException.
> Restarting  flume(or presumably just the HDFSSink) solves the issue.
> Probable cause in comments



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: [VOTE] Apache Flume 1.5.0 RC1

2014-05-14 Thread Arvind Prabhakar
+1

* Verified signatures and checksums for both binary and source tarballs
* Rat check looks good on source tarball
* Nit: Notice file has dated header, needs to be updated but not a blocker

Regards,
Arvind Prabhakar


On Wed, May 7, 2014 at 3:28 PM, Hari Shreedharan
wrote:

> This is a vote for the next release of Apache Flume, version 1.5.0. We are
> voting on release candidate RC1.
>
> It fixes the following issues:
>   http://s.apache.org/4eQ
>
> *** Please cast your vote within the next 72 hours ***
>
> The tarball (*.tar.gz), signature (*.asc), and checksums (*.md5,
> *.sha1) for the source and binary artifacts can be found here:
>https://people.apache.org/~hshreedharan/apache-flume-1.5.0-rc1/
>
> Maven staging repo:
>   https://repository.apache.org/content/repositories/orgapacheflume-1001/
>
>
> The tag to be voted on:
>
>
> https://git-wip-us.apache.org/repos/asf?p=flume.git;a=commit;h=8633220df808c4cd0c13d1cf0320454a94f1ea97
>
> Flume's KEYS file containing PGP keys we use to sign the release:
>   http://www.apache.org/dist/flume/KEYS
>
>
> Thanks,
> Hari
>


[jira] [Commented] (FLUME-2273) ElasticSearchSink: Add handling for header substitution in indexName

2014-05-14 Thread Deepak Subhramanian (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13996346#comment-13996346
 ] 

Deepak Subhramanian commented on FLUME-2273:


If I want to use the patch what are the options to pass in the ESSink 
configuration so that it take one of the header field and append in the 
indexname.I get JSON and xml data. Since ES automatically parse the JSON data 
as an object ,my xml messages are failing as both are created in the same index 
and ES validate new messages with the mapping created for JSON object. I like 
to have it in the same index. Also I recently noticed that LogstashSerializer 
is not generating timestamp fields when I tried to deploy the latest version of 
code. It creates two time stamp field with Flume1.4 version. 

> ElasticSearchSink: Add handling for header substitution in indexName
> 
>
> Key: FLUME-2273
> URL: https://issues.apache.org/jira/browse/FLUME-2273
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Affects Versions: v1.4.0
>Reporter: Paul Merry
>Priority: Minor
> Attachments: FLUME-2273.patch, new_FLUME-2273-2.patch, 
> new_FLUME-2273-5.patch, new_FLUME-2273.patch
>
>
> The ElasticSearchSink would be improved by allowing for header substitution 
> in the indexName property.
> A use case is where the sink is an intermediate part of a chain and the index 
> name is required to identify the message origin, at present it can only be a 
> hardcoded value.
> The HDFS sink already supports header substitution so a similar format would 
> maintain consistency. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)