Re: Spark and continuous integration

2017-03-14 Thread Sam Elamin
Thank you both

Steve that's a very interesting point. I have to admit I have never thought
of doing analysis over time on the tests but it makes sense as the failures
over time tell you quite a bit about your data platform

Thanks for highlighting! We are using Pyspark for now so I hope some
frameworks help with that.

Previously we have built data sanity checks that look at counts and numbers
to produce graphs using statsd and Grafana (elk stack) but not necessarily
looking at test metrics


I'll definitely check it out

Kind regards
Sam
On Tue, 14 Mar 2017 at 11:57, Jörn Franke  wrote:

> I agree the reporting is an important aspect. Sonarqube (or similar tool)
> can report over time, but does not support Scala (well indirectly via
> JaCoCo). In the end, you will need to think about a dashboard that displays
> results over time.
>
> On 14 Mar 2017, at 12:44, Steve Loughran  wrote:
>
>
> On 13 Mar 2017, at 13:24, Sam Elamin  wrote:
>
> Hi Jorn
>
> Thanks for the prompt reply, really we have 2 main concerns with CD,
> ensuring tests pasts and linting on the code.
>
>
> I'd add "providing diagnostics when tests fail", which is a combination
> of: tests providing useful information and CI tooling collecting all those
> results and presenting them meaningfully. The hard parts are invariably (at
> least for me)
>
> -what to do about the intermittent failures
> -tradeoff between thorough testing and fast testing, especially when
> thorough means "better/larger datasets"
>
> You can consider the output of jenkins & tests as data sources for your
> own analysis too: track failure rates over time, test runs over time, etc:
> could be interesting. If you want to go there, then the question of "which
> CI toolings produce the most interesting machine-parseable results, above
> and beyond the classic Ant-originated XML test run reports"
>
> I have mixed feelings about scalatest there: I think the expression
> language is good, but the maven test runner doesn't report that well, at
> least for me:
>
>
> https://steveloughran.blogspot.co.uk/2016/09/scalatest-thoughts-and-ideas.html
>
>
>
> I think all platforms should handle this with ease, I was just wondering
> what people are using.
>
> Jenkins seems to have the best spark plugins so we are investigating that
> as well as a variety of other hosted CI tools
>
> Happy to write a blog post detailing our findings and sharing it here if
> people are interested
>
>
> Regards
> Sam
>
> On Mon, Mar 13, 2017 at 1:18 PM, Jörn Franke  wrote:
>
> Hi,
>
> Jenkins also now supports pipeline as code and multibranch pipelines. thus
> you are not so dependent on the UI and you do not need anymore a long list
> of jobs for different branches. Additionally it has a new UI (beta) called
> blueocean, which is a little bit nicer. You may also check GoCD. Aside from
> this you have a huge variety of commercial tools, e.g. Bamboo.
> In the cloud, I use for my open source github projects Travis-Ci, but
> there are also a lot of alternatives, e.g. Distelli.
>
> It really depends what you expect, e.g. If you want to Version the build
> pipeline in GIT, if you need Docker deployment etc. I am not sure if new
> starters should be responsible for the build pipeline, thus I am not sure
> that i understand  your concern in this area.
>
> From my experience, integration tests for Spark can be run on any of these
> platforms.
>
> Best regards
>
> > On 13 Mar 2017, at 10:55, Sam Elamin  wrote:
> >
> > Hi Folks
> >
> > This is more of a general question. What's everyone using for their CI
> /CD when it comes to spark
> >
> > We are using Pyspark but potentially looking to make to spark scala and
> Sbt in the future
> >
> >
> > One of the suggestions was jenkins but I know the UI isn't great for new
> starters so I'd rather avoid it. I've used team city but that was more
> focused on dot net development
> >
> >
> > What are people using?
> >
> > Kind Regards
> > Sam
>
>
>
>


Re: Spark and continuous integration

2017-03-14 Thread Jörn Franke
I agree the reporting is an important aspect. Sonarqube (or similar tool) can 
report over time, but does not support Scala (well indirectly via JaCoCo). In 
the end, you will need to think about a dashboard that displays results over 
time. 

> On 14 Mar 2017, at 12:44, Steve Loughran  wrote:
> 
> 
>> On 13 Mar 2017, at 13:24, Sam Elamin  wrote:
>> 
>> Hi Jorn
>> 
>> Thanks for the prompt reply, really we have 2 main concerns with CD, 
>> ensuring tests pasts and linting on the code. 
> 
> I'd add "providing diagnostics when tests fail", which is a combination of: 
> tests providing useful information and CI tooling collecting all those 
> results and presenting them meaningfully. The hard parts are invariably (at 
> least for me)
> 
> -what to do about the intermittent failures
> -tradeoff between thorough testing and fast testing, especially when thorough 
> means "better/larger datasets"
> 
> You can consider the output of jenkins & tests as data sources for your own 
> analysis too: track failure rates over time, test runs over time, etc: could 
> be interesting. If you want to go there, then the question of "which CI 
> toolings produce the most interesting machine-parseable results, above and 
> beyond the classic Ant-originated XML test run reports"
> 
> I have mixed feelings about scalatest there: I think the expression language 
> is good, but the maven test runner doesn't report that well, at least for me:
> 
> https://steveloughran.blogspot.co.uk/2016/09/scalatest-thoughts-and-ideas.html
> 
> 
>> 
>> I think all platforms should handle this with ease, I was just wondering 
>> what people are using.
>> 
>> Jenkins seems to have the best spark plugins so we are investigating that as 
>> well as a variety of other hosted CI tools
>> 
>> Happy to write a blog post detailing our findings and sharing it here if 
>> people are interested 
>> 
>> 
>> Regards
>> Sam
>> 
>>> On Mon, Mar 13, 2017 at 1:18 PM, Jörn Franke  wrote:
>>> Hi,
>>> 
>>> Jenkins also now supports pipeline as code and multibranch pipelines. thus 
>>> you are not so dependent on the UI and you do not need anymore a long list 
>>> of jobs for different branches. Additionally it has a new UI (beta) called 
>>> blueocean, which is a little bit nicer. You may also check GoCD. Aside from 
>>> this you have a huge variety of commercial tools, e.g. Bamboo.
>>> In the cloud, I use for my open source github projects Travis-Ci, but there 
>>> are also a lot of alternatives, e.g. Distelli.
>>> 
>>> It really depends what you expect, e.g. If you want to Version the build 
>>> pipeline in GIT, if you need Docker deployment etc. I am not sure if new 
>>> starters should be responsible for the build pipeline, thus I am not sure 
>>> that i understand  your concern in this area.
>>> 
>>> From my experience, integration tests for Spark can be run on any of these 
>>> platforms.
>>> 
>>> Best regards
>>> 
>>> > On 13 Mar 2017, at 10:55, Sam Elamin  wrote:
>>> >
>>> > Hi Folks
>>> >
>>> > This is more of a general question. What's everyone using for their CI 
>>> > /CD when it comes to spark
>>> >
>>> > We are using Pyspark but potentially looking to make to spark scala and 
>>> > Sbt in the future
>>> >
>>> >
>>> > One of the suggestions was jenkins but I know the UI isn't great for new 
>>> > starters so I'd rather avoid it. I've used team city but that was more 
>>> > focused on dot net development
>>> >
>>> >
>>> > What are people using?
>>> >
>>> > Kind Regards
>>> > Sam
>> 
> 


Re: Spark and continuous integration

2017-03-14 Thread Steve Loughran

On 13 Mar 2017, at 13:24, Sam Elamin 
> wrote:

Hi Jorn

Thanks for the prompt reply, really we have 2 main concerns with CD, ensuring 
tests pasts and linting on the code.

I'd add "providing diagnostics when tests fail", which is a combination of: 
tests providing useful information and CI tooling collecting all those results 
and presenting them meaningfully. The hard parts are invariably (at least for 
me)

-what to do about the intermittent failures
-tradeoff between thorough testing and fast testing, especially when thorough 
means "better/larger datasets"

You can consider the output of jenkins & tests as data sources for your own 
analysis too: track failure rates over time, test runs over time, etc: could be 
interesting. If you want to go there, then the question of "which CI toolings 
produce the most interesting machine-parseable results, above and beyond the 
classic Ant-originated XML test run reports"

I have mixed feelings about scalatest there: I think the expression language is 
good, but the maven test runner doesn't report that well, at least for me:

https://steveloughran.blogspot.co.uk/2016/09/scalatest-thoughts-and-ideas.html



I think all platforms should handle this with ease, I was just wondering what 
people are using.

Jenkins seems to have the best spark plugins so we are investigating that as 
well as a variety of other hosted CI tools

Happy to write a blog post detailing our findings and sharing it here if people 
are interested


Regards
Sam

On Mon, Mar 13, 2017 at 1:18 PM, Jörn Franke 
> wrote:
Hi,

Jenkins also now supports pipeline as code and multibranch pipelines. thus you 
are not so dependent on the UI and you do not need anymore a long list of jobs 
for different branches. Additionally it has a new UI (beta) called blueocean, 
which is a little bit nicer. You may also check GoCD. Aside from this you have 
a huge variety of commercial tools, e.g. Bamboo.
In the cloud, I use for my open source github projects Travis-Ci, but there are 
also a lot of alternatives, e.g. Distelli.

It really depends what you expect, e.g. If you want to Version the build 
pipeline in GIT, if you need Docker deployment etc. I am not sure if new 
starters should be responsible for the build pipeline, thus I am not sure that 
i understand  your concern in this area.

From my experience, integration tests for Spark can be run on any of these 
platforms.

Best regards

> On 13 Mar 2017, at 10:55, Sam Elamin 
> > wrote:
>
> Hi Folks
>
> This is more of a general question. What's everyone using for their CI /CD 
> when it comes to spark
>
> We are using Pyspark but potentially looking to make to spark scala and Sbt 
> in the future
>
>
> One of the suggestions was jenkins but I know the UI isn't great for new 
> starters so I'd rather avoid it. I've used team city but that was more 
> focused on dot net development
>
>
> What are people using?
>
> Kind Regards
> Sam




Re: Spark and continuous integration

2017-03-13 Thread Sam Elamin
Hi Jorn

Thanks for the prompt reply, really we have 2 main concerns with CD,
ensuring tests pasts and linting on the code.

I think all platforms should handle this with ease, I was just wondering
what people are using.

Jenkins seems to have the best spark plugins so we are investigating that
as well as a variety of other hosted CI tools

Happy to write a blog post detailing our findings and sharing it here if
people are interested


Regards
Sam

On Mon, Mar 13, 2017 at 1:18 PM, Jörn Franke  wrote:

> Hi,
>
> Jenkins also now supports pipeline as code and multibranch pipelines. thus
> you are not so dependent on the UI and you do not need anymore a long list
> of jobs for different branches. Additionally it has a new UI (beta) called
> blueocean, which is a little bit nicer. You may also check GoCD. Aside from
> this you have a huge variety of commercial tools, e.g. Bamboo.
> In the cloud, I use for my open source github projects Travis-Ci, but
> there are also a lot of alternatives, e.g. Distelli.
>
> It really depends what you expect, e.g. If you want to Version the build
> pipeline in GIT, if you need Docker deployment etc. I am not sure if new
> starters should be responsible for the build pipeline, thus I am not sure
> that i understand  your concern in this area.
>
> From my experience, integration tests for Spark can be run on any of these
> platforms.
>
> Best regards
>
> > On 13 Mar 2017, at 10:55, Sam Elamin  wrote:
> >
> > Hi Folks
> >
> > This is more of a general question. What's everyone using for their CI
> /CD when it comes to spark
> >
> > We are using Pyspark but potentially looking to make to spark scala and
> Sbt in the future
> >
> >
> > One of the suggestions was jenkins but I know the UI isn't great for new
> starters so I'd rather avoid it. I've used team city but that was more
> focused on dot net development
> >
> >
> > What are people using?
> >
> > Kind Regards
> > Sam
>


Re: Spark and continuous integration

2017-03-13 Thread Jörn Franke
Hi,

Jenkins also now supports pipeline as code and multibranch pipelines. thus you 
are not so dependent on the UI and you do not need anymore a long list of jobs 
for different branches. Additionally it has a new UI (beta) called blueocean, 
which is a little bit nicer. You may also check GoCD. Aside from this you have 
a huge variety of commercial tools, e.g. Bamboo.
In the cloud, I use for my open source github projects Travis-Ci, but there are 
also a lot of alternatives, e.g. Distelli.

It really depends what you expect, e.g. If you want to Version the build 
pipeline in GIT, if you need Docker deployment etc. I am not sure if new 
starters should be responsible for the build pipeline, thus I am not sure that 
i understand  your concern in this area.

From my experience, integration tests for Spark can be run on any of these 
platforms.

Best regards

> On 13 Mar 2017, at 10:55, Sam Elamin  wrote:
> 
> Hi Folks 
> 
> This is more of a general question. What's everyone using for their CI /CD 
> when it comes to spark 
> 
> We are using Pyspark but potentially looking to make to spark scala and Sbt 
> in the future 
> 
> 
> One of the suggestions was jenkins but I know the UI isn't great for new 
> starters so I'd rather avoid it. I've used team city but that was more 
> focused on dot net development 
> 
> 
> What are people using? 
> 
> Kind Regards 
> Sam 

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org