Archiving XML test reports for analysis

2014-12-15 Thread Nicholas Chammas
Every time we run a test cycle on our Jenkins cluster, we generate hundreds of XML reports covering all the tests we have (e.g. `streaming/target/test-reports/org.apache.spark.streaming.util.WriteAheadLogSuite.xml`). These reports contain interesting information about whether tests succeeded or

Re: spark kafka batch integration

2014-12-15 Thread Cody Koeninger
For an alternative take on a similar idea, see https://github.com/koeninger/spark-1/tree/kafkaRdd/external/kafka/src/main/scala/org/apache/spark/rdd/kafka An advantage of the approach I'm taking is that the lower and upper offsets of the RDD are known in advance, so it's deterministic. I

Spark Web Site

2014-12-15 Thread Pei Chen
Hi Spark Dev, The cTAKES community was looking at revamping their web site and really liked the clean look of the Spark one. Is it just using Bootstrap and static html pages checked into SVN? Or are you using the Apache CMS somehow? Mind if we borrow the layout? --Pei

Re: Archiving XML test reports for analysis

2014-12-15 Thread shane knapp
right now, the following logs are archived on to the master: local log_files=$( find .\ -name unit-tests.log -o\ -path ./sql/hive/target/HiveCompatibilitySuite.failed -o\ -path ./sql/hive/target/HiveCompatibilitySuite.hiveFailed -o\ -path

Re: Spark Web Site

2014-12-15 Thread Matei Zaharia
It's just Bootstrap checked into SVN and built using Jekyll. You can check out the raw source files from SVN from https://svn.apache.org/repos/asf/spark. IMO it's fine if you guys use the layout, but just make sure it doesn't look exactly the same because otherwise both sites will look like

Re: Spark JIRA Report

2014-12-15 Thread Andrew Ash
Nick, Putting the N most stale issues into a report like your latest one does seem like a good way to tackle the wall of text effect that I'm worried about. On Sun, Dec 14, 2014 at 12:28 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Taking after Andrew’s suggestion, perhaps the

Re: Test failures after Jenkins upgrade

2014-12-15 Thread Josh Rosen
There’s a JIRA for this: https://issues.apache.org/jira/browse/SPARK-4826 And two open PRs: https://github.com/apache/spark/pull/3695 https://github.com/apache/spark/pull/3701 We might be close to fixing this via one of those PRs, so maybe we should try using one of those instead? On December

Re: Test failures after Jenkins upgrade

2014-12-15 Thread Patrick Wendell
Ah cool Josh - I think for some reason we are hitting this every time now. Since this is holding up a bunch of other patches, I just pushed something ignoring the tests as a hotfix. Even waiting for a couple hours is really expensive productivity-wise given the frequency with which we run tests.

Re: CrossValidator API in new spark.ml package

2014-12-15 Thread Xiangrui Meng
Yes, regularization path could be viewed as training multiple models at once. -Xiangrui On Sat, Dec 13, 2014 at 6:53 AM, DB Tsai dbt...@dbtsai.com wrote: Okay, I got it. In Estimator, fit(dataset: SchemaRDD, paramMaps: Array[ParamMap]): Seq[M] can be overwritten to implement regularization

Re: [VOTE] Release Apache Spark 1.2.0 (RC1)

2014-12-15 Thread Xiangrui Meng
Hi Krishna, Thanks for providing the notebook! I tried and found that the problem is with PySpark's zip. I created a JIRA to track the issue: https://issues.apache.org/jira/browse/SPARK-4841 -Xiangrui On Thu, Dec 11, 2014 at 1:55 PM, Krishna Sankar ksanka...@gmail.com wrote: K-Means iPython

Re: Archiving XML test reports for analysis

2014-12-15 Thread Nicholas Chammas
How about all of them https://amplab.cs.berkeley.edu/jenkins/view/Spark/? How much data per day would it roughly be if we uploaded all the logs for all these builds? Also, would Databricks be willing to offer up an S3 bucket for this purpose? Nick On Mon Dec 15 2014 at 11:48:44 AM shane knapp

Re: Archiving XML test reports for analysis

2014-12-15 Thread shane knapp
i have no problem w/storing all of the logs. :) i also have no problem w/donated S3 buckets. :) On Mon, Dec 15, 2014 at 2:39 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: How about all of them https://amplab.cs.berkeley.edu/jenkins/view/Spark/? How much data per day would it

Re: Tachyon in Spark

2014-12-15 Thread Jun Feng Liu
Thanks the response. I got the point - sounds like todays Spark linage dose not push to Tachyon linage. Would be good to see how it works. Jun Feng Liu. Haoyuan Li