Re: classpath conflict with spark internal libraries and the spark shell.

2016-09-09 Thread Colin Kincaid Williams
My bad, gothos on IRC pointed me to the docs: http://jhz.name/2016/01/10/spark-classpath.html Thanks Gothos! On Fri, Sep 9, 2016 at 9:23 PM, Colin Kincaid Williams <disc...@uw.edu> wrote: > I'm using the spark shell v1.61 . I have a classpath conflict, where I > have an exte

classpath conflict with spark internal libraries and the spark shell.

2016-09-09 Thread Colin Kincaid Williams
I'm using the spark shell v1.61 . I have a classpath conflict, where I have an external library ( not OSS either :( , can't rebuild it.) using httpclient-4.5.2.jar. I use spark-shell --jars file:/path/to/httpclient-4.5.2.jar However spark is using httpclient-4.3 internally. Then when I try to use

Re: Improving performance of a kafka spark streaming app

2016-06-22 Thread Colin Kincaid Williams
Streaming UI tab showing empty events and very different metrics than on 1.5.2 On Thu, Jun 23, 2016 at 5:06 AM, Colin Kincaid Williams <disc...@uw.edu> wrote: > After a bit of effort I moved from a Spark cluster running 1.5.2, to a > Yarn cluster running 1.6.1 jars. I'm still settin

Re: Improving performance of a kafka spark streaming app

2016-06-22 Thread Colin Kincaid Williams
related to running on the Spark 1.5.2 cluster. Also is the missing event count in the completed batches a bug? Should I file an issue? On Tue, Jun 21, 2016 at 9:04 PM, Colin Kincaid Williams <disc...@uw.edu> wrote: > Thanks @Cody, I will try that out. In the interm, I tried to validate &

Re: Improving performance of a kafka spark streaming app

2016-06-21 Thread Colin Kincaid Williams
ake HBase out of the equation and just measure what your read > performance is by doing something like > > createDirectStream(...).foreach(_.println) > > not take() or print() > > On Tue, Jun 21, 2016 at 3:19 PM, Colin Kincaid Williams <disc...@uw.edu> > wrote: >>

Re: Improving performance of a kafka spark streaming app

2016-06-21 Thread Colin Kincaid Williams
pic Partitions / Streaming Duration / maxRatePerPartition / any other spark settings or code changes that I should make to try to get a better consumption rate. Thanks for all the help so far, this is the first Spark application I have written. On Mon, Jun 20, 2016 at 12:32 PM, Colin Kincaid Williams &l

Re: Improving performance of a kafka spark streaming app

2016-06-20 Thread Colin Kincaid Williams
and your average processing time is > 1.16 seconds, you're always going to be falling behind. That would > explain why you've built up an hour of scheduling delay after eight > hours of running. > > On Sat, Jun 18, 2016 at 4:40 PM, Colin Kincaid Williams <disc...@uw.edu&

Re: Improving performance of a kafka spark streaming app

2016-06-18 Thread Colin Kincaid Williams
for details including shuffles >> etc? >> >> HTH >> >> Dr Mich Talebzadeh >> >> >> >> LinkedIn >> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> >> >> >> http://talebzadehmich.

Re: Improving performance of a kafka spark streaming app

2016-06-18 Thread Colin Kincaid Williams
I'm attaching a picture from the streaming UI. On Sat, Jun 18, 2016 at 7:59 PM, Colin Kincaid Williams <disc...@uw.edu> wrote: > There are 25 nodes in the spark cluster. > > On Sat, Jun 18, 2016 at 7:53 PM, Mich Talebzadeh > <mich.talebza...@gmail.com> wrote: >> how

Re: Improving performance of a kafka spark streaming app

2016-06-18 Thread Colin Kincaid Williams
> > > Dr Mich Talebzadeh > > > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > > > http://talebzadehmich.wordpress.com > > > > > On 18 June 2016 at 20:40, Colin Kincaid Williams <disc...@uw.edu>

Re: Improving performance of a kafka spark streaming app

2016-06-18 Thread Colin Kincaid Williams
ib/hbase/lib/* \ /home/colin.williams/kafka-hbase.jar "FromTable" "ToTable" "broker1:9092,broker2:9092" On Tue, May 3, 2016 at 8:20 PM, Colin Kincaid Williams <disc...@uw.edu> wrote: > Thanks Cody, I can see that the partitions are well distributed... > Then I'm in the

Re: Improving performance of a kafka spark streaming app

2016-05-03 Thread Colin Kincaid Williams
as producers are distributing across partitions evenly). > > On Tue, May 3, 2016 at 1:44 PM, Colin Kincaid Williams <disc...@uw.edu> wrote: >> Thanks again Cody. Regarding the details 66 kafka partitions on 3 >> kafka servers, likely 8 core systems with 10 disks each. Maybe the >

Re: Improving performance of a kafka spark streaming app

2016-05-03 Thread Colin Kincaid Williams
> Really though, I'd try to start with spark 1.6 and direct streams, or > even just kafkacat, as a baseline. > > > > On Mon, May 2, 2016 at 7:01 PM, Colin Kincaid Williams <disc...@uw.edu> wrote: >> Hello again. I searched for "backport kafka" in the list archive

Re: Improving performance of a kafka spark streaming app

2016-05-02 Thread Colin Kincaid Williams
.2, or is upgrading possible? The > kafka direct stream is available starting with 1.3. If you're stuck > on 1.2, I believe there have been some attempts to backport it, search > the mailing list archives. > > On Mon, May 2, 2016 at 12:54 PM, Colin Kincaid Williams <disc...@uw.edu&g

Re: Improving performance of a kafka spark streaming app

2016-05-02 Thread Colin Kincaid Williams
ted to using spark 1.2, or is upgrading possible? The > kafka direct stream is available starting with 1.3. If you're stuck > on 1.2, I believe there have been some attempts to backport it, search > the mailing list archives. > > On Mon, May 2, 2016 at 12:54 PM, Colin Kincaid W

Re: Improving performance of a kafka spark streaming app

2016-05-02 Thread Colin Kincaid Williams
park, at least > to some extent. > > David Krieg | Enterprise Software Engineer > Early Warning > Direct: 480.426.2171 | Fax: 480.483.4628 | Mobile: 859.227.6173 > > > -----Original Message- > From: Colin Kincaid Williams [mailto:disc...@uw.edu] > Sent: Monday, May 02, 2

Improving performance of a kafka spark streaming app

2016-05-02 Thread Colin Kincaid Williams
I've written an application to get content from a kafka topic with 1.7 billion entries, get the protobuf serialized entries, and insert into hbase. Currently the environment that I'm running in is Spark 1.2. With 8 executors and 2 cores, and 2 jobs, I'm only getting between 0-2500 writes /

Running out of memory locally launching multiple spark jobs using spark yarn / submit from shell script.

2016-01-17 Thread Colin Kincaid Williams
I launch around 30-60 of these jobs defined like start-job.sh in the background from a wrapper script. I wait about 30 seconds between launches, then the wrapper monitors yarn to determine when to launch more. There is a limit defined at around 60 jobs, but even if I set it to 30, I run out of

Re: How to get yarn logs to display in the spark or yarn history-server?

2015-02-24 Thread Colin Kincaid Williams
/hadoop/sbin/mr-jobhistory-daemon.sh start historyserver It may be slightly different for you if the resource manager and the history server are not on the same machine. Hope it will work for you as well! Christophe. On 24/02/2015 06:31, Colin Kincaid Williams wrote: Hi, I have been trying

Re: How to get yarn logs to display in the spark or yarn history-server?

2015-02-24 Thread Colin Kincaid Williams
the info in one place. On Tue, Feb 24, 2015 at 12:36 PM, Colin Kincaid Williams disc...@uw.edu wrote: Looks like in my tired state, I didn't mention spark the whole time. However, it might be implied by the application log above. Spark log aggregation appears to be working, since I can run

How to get yarn logs to display in the spark or yarn history-server?

2015-02-23 Thread Colin Kincaid Williams
Hi, I have been trying to get my yarn logs to display in the spark history-server or yarn history-server. I can see the log information yarn logs -applicationId application_1424740955620_0009 15/02/23 22:15:14 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to