Re: Spark and continuous integration
Thank you both Steve that's a very interesting point. I have to admit I have never thought of doing analysis over time on the tests but it makes sense as the failures over time tell you quite a bit about your data platform Thanks for highlighting! We are using Pyspark for now so I hope some frameworks help with that. Previously we have built data sanity checks that look at counts and numbers to produce graphs using statsd and Grafana (elk stack) but not necessarily looking at test metrics I'll definitely check it out Kind regards Sam On Tue, 14 Mar 2017 at 11:57, Jörn Frankewrote: > I agree the reporting is an important aspect. Sonarqube (or similar tool) > can report over time, but does not support Scala (well indirectly via > JaCoCo). In the end, you will need to think about a dashboard that displays > results over time. > > On 14 Mar 2017, at 12:44, Steve Loughran wrote: > > > On 13 Mar 2017, at 13:24, Sam Elamin wrote: > > Hi Jorn > > Thanks for the prompt reply, really we have 2 main concerns with CD, > ensuring tests pasts and linting on the code. > > > I'd add "providing diagnostics when tests fail", which is a combination > of: tests providing useful information and CI tooling collecting all those > results and presenting them meaningfully. The hard parts are invariably (at > least for me) > > -what to do about the intermittent failures > -tradeoff between thorough testing and fast testing, especially when > thorough means "better/larger datasets" > > You can consider the output of jenkins & tests as data sources for your > own analysis too: track failure rates over time, test runs over time, etc: > could be interesting. If you want to go there, then the question of "which > CI toolings produce the most interesting machine-parseable results, above > and beyond the classic Ant-originated XML test run reports" > > I have mixed feelings about scalatest there: I think the expression > language is good, but the maven test runner doesn't report that well, at > least for me: > > > https://steveloughran.blogspot.co.uk/2016/09/scalatest-thoughts-and-ideas.html > > > > I think all platforms should handle this with ease, I was just wondering > what people are using. > > Jenkins seems to have the best spark plugins so we are investigating that > as well as a variety of other hosted CI tools > > Happy to write a blog post detailing our findings and sharing it here if > people are interested > > > Regards > Sam > > On Mon, Mar 13, 2017 at 1:18 PM, Jörn Franke wrote: > > Hi, > > Jenkins also now supports pipeline as code and multibranch pipelines. thus > you are not so dependent on the UI and you do not need anymore a long list > of jobs for different branches. Additionally it has a new UI (beta) called > blueocean, which is a little bit nicer. You may also check GoCD. Aside from > this you have a huge variety of commercial tools, e.g. Bamboo. > In the cloud, I use for my open source github projects Travis-Ci, but > there are also a lot of alternatives, e.g. Distelli. > > It really depends what you expect, e.g. If you want to Version the build > pipeline in GIT, if you need Docker deployment etc. I am not sure if new > starters should be responsible for the build pipeline, thus I am not sure > that i understand your concern in this area. > > From my experience, integration tests for Spark can be run on any of these > platforms. > > Best regards > > > On 13 Mar 2017, at 10:55, Sam Elamin wrote: > > > > Hi Folks > > > > This is more of a general question. What's everyone using for their CI > /CD when it comes to spark > > > > We are using Pyspark but potentially looking to make to spark scala and > Sbt in the future > > > > > > One of the suggestions was jenkins but I know the UI isn't great for new > starters so I'd rather avoid it. I've used team city but that was more > focused on dot net development > > > > > > What are people using? > > > > Kind Regards > > Sam > > > >
Re: Spark and continuous integration
I agree the reporting is an important aspect. Sonarqube (or similar tool) can report over time, but does not support Scala (well indirectly via JaCoCo). In the end, you will need to think about a dashboard that displays results over time. > On 14 Mar 2017, at 12:44, Steve Loughranwrote: > > >> On 13 Mar 2017, at 13:24, Sam Elamin wrote: >> >> Hi Jorn >> >> Thanks for the prompt reply, really we have 2 main concerns with CD, >> ensuring tests pasts and linting on the code. > > I'd add "providing diagnostics when tests fail", which is a combination of: > tests providing useful information and CI tooling collecting all those > results and presenting them meaningfully. The hard parts are invariably (at > least for me) > > -what to do about the intermittent failures > -tradeoff between thorough testing and fast testing, especially when thorough > means "better/larger datasets" > > You can consider the output of jenkins & tests as data sources for your own > analysis too: track failure rates over time, test runs over time, etc: could > be interesting. If you want to go there, then the question of "which CI > toolings produce the most interesting machine-parseable results, above and > beyond the classic Ant-originated XML test run reports" > > I have mixed feelings about scalatest there: I think the expression language > is good, but the maven test runner doesn't report that well, at least for me: > > https://steveloughran.blogspot.co.uk/2016/09/scalatest-thoughts-and-ideas.html > > >> >> I think all platforms should handle this with ease, I was just wondering >> what people are using. >> >> Jenkins seems to have the best spark plugins so we are investigating that as >> well as a variety of other hosted CI tools >> >> Happy to write a blog post detailing our findings and sharing it here if >> people are interested >> >> >> Regards >> Sam >> >>> On Mon, Mar 13, 2017 at 1:18 PM, Jörn Franke wrote: >>> Hi, >>> >>> Jenkins also now supports pipeline as code and multibranch pipelines. thus >>> you are not so dependent on the UI and you do not need anymore a long list >>> of jobs for different branches. Additionally it has a new UI (beta) called >>> blueocean, which is a little bit nicer. You may also check GoCD. Aside from >>> this you have a huge variety of commercial tools, e.g. Bamboo. >>> In the cloud, I use for my open source github projects Travis-Ci, but there >>> are also a lot of alternatives, e.g. Distelli. >>> >>> It really depends what you expect, e.g. If you want to Version the build >>> pipeline in GIT, if you need Docker deployment etc. I am not sure if new >>> starters should be responsible for the build pipeline, thus I am not sure >>> that i understand your concern in this area. >>> >>> From my experience, integration tests for Spark can be run on any of these >>> platforms. >>> >>> Best regards >>> >>> > On 13 Mar 2017, at 10:55, Sam Elamin wrote: >>> > >>> > Hi Folks >>> > >>> > This is more of a general question. What's everyone using for their CI >>> > /CD when it comes to spark >>> > >>> > We are using Pyspark but potentially looking to make to spark scala and >>> > Sbt in the future >>> > >>> > >>> > One of the suggestions was jenkins but I know the UI isn't great for new >>> > starters so I'd rather avoid it. I've used team city but that was more >>> > focused on dot net development >>> > >>> > >>> > What are people using? >>> > >>> > Kind Regards >>> > Sam >> >
Re: Spark and continuous integration
On 13 Mar 2017, at 13:24, Sam Elamin> wrote: Hi Jorn Thanks for the prompt reply, really we have 2 main concerns with CD, ensuring tests pasts and linting on the code. I'd add "providing diagnostics when tests fail", which is a combination of: tests providing useful information and CI tooling collecting all those results and presenting them meaningfully. The hard parts are invariably (at least for me) -what to do about the intermittent failures -tradeoff between thorough testing and fast testing, especially when thorough means "better/larger datasets" You can consider the output of jenkins & tests as data sources for your own analysis too: track failure rates over time, test runs over time, etc: could be interesting. If you want to go there, then the question of "which CI toolings produce the most interesting machine-parseable results, above and beyond the classic Ant-originated XML test run reports" I have mixed feelings about scalatest there: I think the expression language is good, but the maven test runner doesn't report that well, at least for me: https://steveloughran.blogspot.co.uk/2016/09/scalatest-thoughts-and-ideas.html I think all platforms should handle this with ease, I was just wondering what people are using. Jenkins seems to have the best spark plugins so we are investigating that as well as a variety of other hosted CI tools Happy to write a blog post detailing our findings and sharing it here if people are interested Regards Sam On Mon, Mar 13, 2017 at 1:18 PM, Jörn Franke > wrote: Hi, Jenkins also now supports pipeline as code and multibranch pipelines. thus you are not so dependent on the UI and you do not need anymore a long list of jobs for different branches. Additionally it has a new UI (beta) called blueocean, which is a little bit nicer. You may also check GoCD. Aside from this you have a huge variety of commercial tools, e.g. Bamboo. In the cloud, I use for my open source github projects Travis-Ci, but there are also a lot of alternatives, e.g. Distelli. It really depends what you expect, e.g. If you want to Version the build pipeline in GIT, if you need Docker deployment etc. I am not sure if new starters should be responsible for the build pipeline, thus I am not sure that i understand your concern in this area. From my experience, integration tests for Spark can be run on any of these platforms. Best regards > On 13 Mar 2017, at 10:55, Sam Elamin > > wrote: > > Hi Folks > > This is more of a general question. What's everyone using for their CI /CD > when it comes to spark > > We are using Pyspark but potentially looking to make to spark scala and Sbt > in the future > > > One of the suggestions was jenkins but I know the UI isn't great for new > starters so I'd rather avoid it. I've used team city but that was more > focused on dot net development > > > What are people using? > > Kind Regards > Sam
Re: Spark and continuous integration
Hi Jorn Thanks for the prompt reply, really we have 2 main concerns with CD, ensuring tests pasts and linting on the code. I think all platforms should handle this with ease, I was just wondering what people are using. Jenkins seems to have the best spark plugins so we are investigating that as well as a variety of other hosted CI tools Happy to write a blog post detailing our findings and sharing it here if people are interested Regards Sam On Mon, Mar 13, 2017 at 1:18 PM, Jörn Frankewrote: > Hi, > > Jenkins also now supports pipeline as code and multibranch pipelines. thus > you are not so dependent on the UI and you do not need anymore a long list > of jobs for different branches. Additionally it has a new UI (beta) called > blueocean, which is a little bit nicer. You may also check GoCD. Aside from > this you have a huge variety of commercial tools, e.g. Bamboo. > In the cloud, I use for my open source github projects Travis-Ci, but > there are also a lot of alternatives, e.g. Distelli. > > It really depends what you expect, e.g. If you want to Version the build > pipeline in GIT, if you need Docker deployment etc. I am not sure if new > starters should be responsible for the build pipeline, thus I am not sure > that i understand your concern in this area. > > From my experience, integration tests for Spark can be run on any of these > platforms. > > Best regards > > > On 13 Mar 2017, at 10:55, Sam Elamin wrote: > > > > Hi Folks > > > > This is more of a general question. What's everyone using for their CI > /CD when it comes to spark > > > > We are using Pyspark but potentially looking to make to spark scala and > Sbt in the future > > > > > > One of the suggestions was jenkins but I know the UI isn't great for new > starters so I'd rather avoid it. I've used team city but that was more > focused on dot net development > > > > > > What are people using? > > > > Kind Regards > > Sam >
Re: Spark and continuous integration
Hi, Jenkins also now supports pipeline as code and multibranch pipelines. thus you are not so dependent on the UI and you do not need anymore a long list of jobs for different branches. Additionally it has a new UI (beta) called blueocean, which is a little bit nicer. You may also check GoCD. Aside from this you have a huge variety of commercial tools, e.g. Bamboo. In the cloud, I use for my open source github projects Travis-Ci, but there are also a lot of alternatives, e.g. Distelli. It really depends what you expect, e.g. If you want to Version the build pipeline in GIT, if you need Docker deployment etc. I am not sure if new starters should be responsible for the build pipeline, thus I am not sure that i understand your concern in this area. From my experience, integration tests for Spark can be run on any of these platforms. Best regards > On 13 Mar 2017, at 10:55, Sam Elaminwrote: > > Hi Folks > > This is more of a general question. What's everyone using for their CI /CD > when it comes to spark > > We are using Pyspark but potentially looking to make to spark scala and Sbt > in the future > > > One of the suggestions was jenkins but I know the UI isn't great for new > starters so I'd rather avoid it. I've used team city but that was more > focused on dot net development > > > What are people using? > > Kind Regards > Sam - To unsubscribe e-mail: user-unsubscr...@spark.apache.org