Re: Suggest to workaround the org.eclipse.jetty.orbit problem with SBT 0.13.2-RC1

2014-03-25 Thread Will Benton
- Original Message - At last, I worked around this issue by updating my local SBT to 0.13.2-RC1. If any of you are experiencing similar problem, I suggest you upgrade your local SBT version. If this issue is causing grief for anyone on Fedora 20, know that you can install sbt via yum

Re: [VOTE] Release Apache Spark 0.9.1 (RC3)

2014-03-28 Thread Will Benton
RC3 works with the applications I'm working on now and MLLib performance is indeed perceptibly improved over 0.9.0 (although I haven't done a real evaluation). Also, from the downstream perspective, I've been tracking the 0.9.1 RCs in Fedora and have no issues to report there either:

Re: [VOTE] Release Apache Spark 1.0.0 (RC11)

2014-05-28 Thread Will Benton
+1 I made the necessary interface changes to my apps that use MLLib and tested all of my code against rc11 on Fedora 20 and OS X 10.9.3. (The Fedora Rawhide package remains at 0.9.1 pending some additional dependency packaging work.) best, wb - Original Message - From: Tathagata

ContextCleaner, weak references, and serialization

2014-05-28 Thread Will Benton
Friends, For context (so to speak), I did some work in the 0.9 timeframe to fix SPARK-897 (provide immediate feedback when closures aren't serializable) and SPARK-729 (make sure that free variables in closures are captured when the RDD transformations are declared). I currently have a branch

Re: Kryo serialization for closures: a workaround

2014-05-28 Thread Will Benton
This is an interesting approach, Nilesh! Someone will correct me if I'm wrong, but I don't think this could go into ClosureCleaner as a default behavior (since Kryo apparently breaks on some classes that depend on custom Java serializers, as has come up on the list recently). But it does seem

question about Hive compatiblilty tests

2014-06-18 Thread Will Benton
Hi all, Does a Failed to generate golden answer for query message from HiveComparisonTests indicate that it isn't possible to run the query in question under Hive from Spark's test suite rather than anything about Spark's implementation of HiveQL? The stack trace I'm getting implicates Hive

Re: question about Hive compatiblilty tests

2014-06-18 Thread Will Benton
I assume you are adding tests? because that is the only time you should see that message. Yes, I had added the HAVING test to the whitelist. That error could mean a couple of things: 1) The query is invalid and hive threw an exception 2) Your Hive setup is bad. Regarding #2, you need

Re: Scala examples for Spark do not work as written in documentation

2014-06-20 Thread Will Benton
Hey, sorry to reanimate this thread, but just a quick question: why do the examples (on http://spark.apache.org/examples.html) use spark for the SparkContext reference? This is minor, but it seems like it could be a little confusing for people who want to run them in the shell and need to

odd test suite failures while adding functions to Catalyst

2014-07-08 Thread Will Benton
Hi all, I was testing an addition to Catalyst today (reimplementing a Hive UDF) and ran into some odd failures in the test suite. In particular, it seems that what most of these have in common is that an array is spuriously reversed somewhere. For example, the stddev tests in the

Profiling Spark tests with YourKit (or something else)

2014-07-14 Thread Will Benton
Hi all, I've been evaluating YourKit and would like to profile the heap and CPU usage of certain tests from the Spark test suite. In particular, I'm very interested in tracking heap usage by allocation site. Unfortunately, I get a lot of crashes running Spark tests with profiling (and thus

Re: Profiling Spark tests with YourKit (or something else)

2014-07-14 Thread Will Benton
). Maybe they are very close to full and profiling pushes them over the edge. Matei On Jul 14, 2014, at 9:51 AM, Will Benton wi...@redhat.com wrote: Hi all, I've been evaluating YourKit and would like to profile the heap and CPU usage of certain tests from the Spark test suite

Re: Profiling Spark tests with YourKit (or something else)

2014-07-14 Thread Will Benton
- Original Message - From: Aaron Davidson ilike...@gmail.com To: dev@spark.apache.org Sent: Monday, July 14, 2014 5:21:10 PM Subject: Re: Profiling Spark tests with YourKit (or something else) Out of curiosity, what problems are you seeing with Utils.getCallSite? Aaron, if I enable

Re: Profiling Spark tests with YourKit (or something else)

2014-07-14 Thread Will Benton
with YourKit (or something else) Would you mind filing a JIRA for this? That does sound like something bogus happening on the JVM/YourKit level, but this sort of diagnosis is sufficiently important that we should be resilient against it. On Mon, Jul 14, 2014 at 6:01 PM, Will Benton wi

preferred Hive/Hadoop environment for generating golden test outputs

2014-07-17 Thread Will Benton
Hi all, What's the preferred environment for generating golden test outputs for new Hive tests? In particular: * what Hadoop version and Hive version should I be using, * are there particular distributions people have run successfully, and * are there any system properties or environment

Re: [VOTE] Release Apache Spark 1.1.0 (RC3)

2014-08-31 Thread Will Benton
- Original Message - dev/run-tests fails two tests (1 Hive, 1 Kafka Streaming) for me locally on 1.1.0-rc3. Does anyone else see that? It may be my env. Although I still see the Hive failure on Debian too: [info] - SET commands semantics for a HiveContext *** FAILED *** [info]

Re: [VOTE] Release Apache Spark 1.1.0 (RC3)

2014-09-02 Thread Will Benton
but will take another look later this week. best, wb - Original Message - From: Sean Owen so...@cloudera.com To: Will Benton wi...@redhat.com Cc: Patrick Wendell pwend...@gmail.com, dev@spark.apache.org Sent: Sunday, August 31, 2014 12:18:42 PM Subject: Re: [VOTE] Release Apache Spark 1.1.0

Re: [VOTE] Release Apache Spark 1.1.0 (RC3)

2014-09-02 Thread Will Benton
+1 Tested Scala/MLlib apps on Fedora 20 (OpenJDK 7) and OS X 10.9 (Oracle JDK 8). best, wb - Original Message - From: Patrick Wendell pwend...@gmail.com To: dev@spark.apache.org Sent: Saturday, August 30, 2014 5:07:52 PM Subject: [VOTE] Release Apache Spark 1.1.0 (RC3) Please

Re: Question about SparkSQL and Hive-on-Spark

2014-09-23 Thread Will Benton
Hi Yi, I've had some interest in implementing windowing and rollup in particular for some of my applications but haven't had them on the front of my plate yet. If you need them as well, I'm happy to start taking a look this week. best, wb - Original Message - From: Yi Tian

Re: Question about SparkSQL and Hive-on-Spark

2014-09-24 Thread Will Benton
/pull/1567 As far as windowing, I'll be developing my own test cases but would appreciate it if you could also share some kinds of queries you're interested in so that I can incorporate them as well. best, wb - Original Message - From: Yi Tian tianyi.asiai...@gmail.com To: Will Benton

Re: best IDE for scala + spark development?

2014-10-27 Thread Will Benton
I'll chime in as yet another user who is extremely happy with sbt and a text editor. (In my experience, running ack from the command line is usually just as easy and fast as using an IDE's find-in-project facility.) You can, of course, extend editors with Scala-specific IDE-like functionality

Re: not found: type LocalSparkContext

2015-01-20 Thread Will Benton
It's declared here: https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/LocalSparkContext.scala I assume you're already importing LocalSparkContext, but since the test classes aren't included in Spark packages, you'll also need to package them up in order to use

Re: Standardized Spark dev environment

2015-01-20 Thread Will Benton
Hey Nick, I did something similar with a Docker image last summer; I haven't updated the images to cache the dependencies for the current Spark master, but it would be trivial to do so: http://chapeau.freevariable.com/2014/08/jvm-test-docker.html best, wb - Original Message -

Re: Easy way to convert Row back to case class

2015-05-08 Thread Will Benton
This might not be the easiest way, but it's pretty easy: you can use Row(field_1, ..., field_n) as a pattern in a case match. So if you have a data frame with foo as an int column and bar as a String columns and you want to construct instances of a case class that wraps these up, you can do

certification suite?

2016-04-28 Thread William Benton
Hi all, Does anyone happen to know what tests Databricks uses for the Spark distribution certification suite? Is it simply the tests that run as CI on Spark pull requests, or is there something more involved? The web site (

Re: SPIP: Spark on Kubernetes

2017-08-15 Thread William Benton
+1 (non-binding) On Tue, Aug 15, 2017 at 10:32 AM, Anirudh Ramanathan < fox...@google.com.invalid> wrote: > Spark on Kubernetes effort has been developed separately in a fork, and > linked back from the Apache Spark project as an experimental backend >

Re: mllib + SQL

2018-08-30 Thread William Benton
What are you interested in accomplishing? The spark.ml package has provided a machine learning API based on DataFrames for quite some time. If you are interested in mixing query processing and machine learning, this is certainly the best place to start. See here: