Re: Spark and continuous integration

Sam Elamin Tue, 14 Mar 2017 05:43:34 -0700

Thank you both

Steve that's a very interesting point. I have to admit I have never thought
of doing analysis over time on the tests but it makes sense as the failures
over time tell you quite a bit about your data platform


Thanks for highlighting! We are using Pyspark for now so I hope some
frameworks help with that.

Previously we have built data sanity checks that look at counts and numbers
to produce graphs using statsd and Grafana (elk stack) but not necessarily
looking at test metrics


I'll definitely check it out

Kind regards
Sam
On Tue, 14 Mar 2017 at 11:57, Jörn Franke <jornfra...@gmail.com> wrote:

> I agree the reporting is an important aspect. Sonarqube (or similar tool)
> can report over time, but does not support Scala (well indirectly via
> JaCoCo). In the end, you will need to think about a dashboard that displays
> results over time.
>
> On 14 Mar 2017, at 12:44, Steve Loughran <ste...@hortonworks.com> wrote:
>
>
> On 13 Mar 2017, at 13:24, Sam Elamin <hussam.ela...@gmail.com> wrote:
>
> Hi Jorn
>
> Thanks for the prompt reply, really we have 2 main concerns with CD,
> ensuring tests pasts and linting on the code.
>
>
> I'd add "providing diagnostics when tests fail", which is a combination
> of: tests providing useful information and CI tooling collecting all those
> results and presenting them meaningfully. The hard parts are invariably (at
> least for me)
>
> -what to do about the intermittent failures
> -tradeoff between thorough testing and fast testing, especially when
> thorough means "better/larger datasets"
>
> You can consider the output of jenkins & tests as data sources for your
> own analysis too: track failure rates over time, test runs over time, etc:
> could be interesting. If you want to go there, then the question of "which
> CI toolings produce the most interesting machine-parseable results, above
> and beyond the classic Ant-originated XML test run reports"
>
> I have mixed feelings about scalatest there: I think the expression
> language is good, but the maven test runner doesn't report that well, at
> least for me:
>
>
> https://steveloughran.blogspot.co.uk/2016/09/scalatest-thoughts-and-ideas.html
>
>
>
> I think all platforms should handle this with ease, I was just wondering
> what people are using.
>
> Jenkins seems to have the best spark plugins so we are investigating that
> as well as a variety of other hosted CI tools
>
> Happy to write a blog post detailing our findings and sharing it here if
> people are interested
>
>
> Regards
> Sam
>
> On Mon, Mar 13, 2017 at 1:18 PM, Jörn Franke <jornfra...@gmail.com> wrote:
>
> Hi,
>
> Jenkins also now supports pipeline as code and multibranch pipelines. thus
> you are not so dependent on the UI and you do not need anymore a long list
> of jobs for different branches. Additionally it has a new UI (beta) called
> blueocean, which is a little bit nicer. You may also check GoCD. Aside from
> this you have a huge variety of commercial tools, e.g. Bamboo.
> In the cloud, I use for my open source github projects Travis-Ci, but
> there are also a lot of alternatives, e.g. Distelli.
>
> It really depends what you expect, e.g. If you want to Version the build
> pipeline in GIT, if you need Docker deployment etc. I am not sure if new
> starters should be responsible for the build pipeline, thus I am not sure
> that i understand  your concern in this area.
>
> From my experience, integration tests for Spark can be run on any of these
> platforms.
>
> Best regards
>
> > On 13 Mar 2017, at 10:55, Sam Elamin <hussam.ela...@gmail.com> wrote:
> >
> > Hi Folks
> >
> > This is more of a general question. What's everyone using for their CI
> /CD when it comes to spark
> >
> > We are using Pyspark but potentially looking to make to spark scala and
> Sbt in the future
> >
> >
> > One of the suggestions was jenkins but I know the UI isn't great for new
> starters so I'd rather avoid it. I've used team city but that was more
> focused on dot net development
> >
> >
> > What are people using?
> >
> > Kind Regards
> > Sam
>
>
>
>

Re: Spark and continuous integration

Reply via email to