Every time we run a test cycle on our Jenkins cluster, we generate hundreds
of XML reports covering all the tests we have (e.g.
`streaming/target/test-reports/org.apache.spark.streaming.util.WriteAheadLogSuite.xml`).
These reports contain interesting information about whether tests succeeded
or
For an alternative take on a similar idea, see
https://github.com/koeninger/spark-1/tree/kafkaRdd/external/kafka/src/main/scala/org/apache/spark/rdd/kafka
An advantage of the approach I'm taking is that the lower and upper offsets
of the RDD are known in advance, so it's deterministic.
I
Hi Spark Dev,
The cTAKES community was looking at revamping their web site and really
liked the clean look of the Spark one. Is it just using Bootstrap and
static html pages checked into SVN? Or are you using the Apache CMS
somehow? Mind if we borrow the layout?
--Pei
right now, the following logs are archived on to the master:
local log_files=$(
find .\
-name unit-tests.log -o\
-path ./sql/hive/target/HiveCompatibilitySuite.failed -o\
-path ./sql/hive/target/HiveCompatibilitySuite.hiveFailed -o\
-path
It's just Bootstrap checked into SVN and built using Jekyll. You can check out
the raw source files from SVN from https://svn.apache.org/repos/asf/spark. IMO
it's fine if you guys use the layout, but just make sure it doesn't look
exactly the same because otherwise both sites will look like
Nick,
Putting the N most stale issues into a report like your latest one does
seem like a good way to tackle the wall of text effect that I'm worried
about.
On Sun, Dec 14, 2014 at 12:28 PM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:
Taking after Andrew’s suggestion, perhaps the
There’s a JIRA for this: https://issues.apache.org/jira/browse/SPARK-4826
And two open PRs:
https://github.com/apache/spark/pull/3695
https://github.com/apache/spark/pull/3701
We might be close to fixing this via one of those PRs, so maybe we should try
using one of those instead?
On December
Ah cool Josh - I think for some reason we are hitting this every time
now. Since this is holding up a bunch of other patches, I just pushed
something ignoring the tests as a hotfix. Even waiting for a couple
hours is really expensive productivity-wise given the frequency with
which we run tests.
Yes, regularization path could be viewed as training multiple models
at once. -Xiangrui
On Sat, Dec 13, 2014 at 6:53 AM, DB Tsai dbt...@dbtsai.com wrote:
Okay, I got it. In Estimator, fit(dataset: SchemaRDD, paramMaps:
Array[ParamMap]): Seq[M] can be overwritten to implement
regularization
Hi Krishna,
Thanks for providing the notebook! I tried and found that the problem
is with PySpark's zip. I created a JIRA to track the issue:
https://issues.apache.org/jira/browse/SPARK-4841
-Xiangrui
On Thu, Dec 11, 2014 at 1:55 PM, Krishna Sankar ksanka...@gmail.com wrote:
K-Means iPython
How about all of them https://amplab.cs.berkeley.edu/jenkins/view/Spark/? How
much data per day would it roughly be if we uploaded all the logs for all
these builds?
Also, would Databricks be willing to offer up an S3 bucket for this purpose?
Nick
On Mon Dec 15 2014 at 11:48:44 AM shane knapp
i have no problem w/storing all of the logs. :)
i also have no problem w/donated S3 buckets. :)
On Mon, Dec 15, 2014 at 2:39 PM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:
How about all of them https://amplab.cs.berkeley.edu/jenkins/view/Spark/?
How
much data per day would it
Thanks the response. I got the point - sounds like todays Spark linage
dose not push to Tachyon linage. Would be good to see how it works.
Jun Feng Liu.
Haoyuan Li
13 matches
Mail list logo