Re: eager execution and debuggability

2018-05-09 Thread Tim Hunter
The repr() trick is neat when working on a notebook. When working in a library, I used to use an evaluate(dataframe) -> DataFrame function that simply forces the materialization of a dataframe. As Reynold mentions, this is very convenient when working on a lot of chained UDFs, and it is a standard

[ml] Deep learning talks at the Spark Summit Europe

2017-10-10 Thread Tim Hunter
Hello all, following the last Summit, there will be a couple of exciting talks about deep learning and Spark at the next Spark Summit in Dublin.  - Deep Dive Into Deep Learning Pipelines, in which we will go even deeper into the technical aspects for an hour-long session  - Apache Spark and

Re: [VOTE][SPIP] SPARK-21866 Image support in Apache Spark

2017-09-28 Thread Tim Hunter
i-dimensional tensors >>> too. >>> >>> Matei >>> >>> > On Sep 23, 2017, at 7:27 AM, Yanbo Liang <yblia...@gmail.com> wrote: >>> > >>> > +1 >>> > >>> > On Sat, Sep 23, 2017 at 7:08 PM, Noman Khan <

[VOTE][SPIP] SPARK-21866 Image support in Apache Spark

2017-09-21 Thread Tim Hunter
Hello community, I would like to call for a vote on SPARK-21866. It is a short proposal that has important applications for image processing and deep learning. Joseph Bradley has offered to be the shepherd. JIRA ticket: https://issues.apache.org/jira/browse/SPARK-21866 PDF version:

SPIP: SPARK-21866 Image support in Apache Spark

2017-09-05 Thread Tim Hunter
Hello community, I would like to start a discussion about adding support for images in Spark. We will follow up with a formal vote in two weeks. Please feel free to comment on the JIRA ticket too. JIRA ticket: https://issues.apache.org/jira/browse/SPARK-21866 PDF version:

Re: Question on Spark's graph libraries roadmap

2017-03-13 Thread Tim Hunter
Hello Enzo, since this question is also relevant to Spark, I will answer it here. The goal of GraphFrames is to provide graph capabilities along with excellent integration to the rest of the Spark ecosystem (using modern APIs such as DataFrames). As you seem to be well aware, a large number of

Re: [Spark Namespace]: Expanding Spark ML under Different Namespace?

2017-02-24 Thread Tim Hunter
Regarding logging, Graphframes makes a simple wrapper this way: https://github.com/graphframes/graphframes/blob/master/src/main/scala/org/ graphframes/Logging.scala Regarding the UDTs, they have been hidden to be reworked for Datasets, the reasons being detailed here [1]. Can you describe your

Re: Feedback on MLlib roadmap process proposal

2017-02-23 Thread Tim Hunter
As Sean wrote very nicely above, the changes made to Spark are decided in an organic fashion based on the interests and motivations of the committers and contributors. The case of deep learning is a good example. There is a lot of interest, and the core algorithms could be implemented without too

Re: Design document - MLlib's statistical package for DataFrames

2017-02-17 Thread Tim Hunter
Hi Brad, this task is focusing on moving the existing algorithms, so that we are held up by parity issues. Do you have some paper suggestions for cardinality? I do not think there is a feature request on JIRA either. Tim On Thu, Feb 16, 2017 at 2:21 PM, bradc wrote: >

Design document - MLlib's statistical package for DataFrames

2017-02-16 Thread Tim Hunter
Hello all, I have been looking at some of the missing items for complete feature parity between spark.ml and spark.mllib. Here is a proposal for porting mllib.stats, the descriptive statistics package:

Re: Spark Improvement Proposals

2017-01-05 Thread Tim Hunter
Hi Cody, thank you for bringing up this topic, I agree it is very important to keep a cohesive community around some common, fluid goals. Here are a few comments about the current document: 1. name: it should not overlap with an existing one such as SIP. Can you imagine someone trying to discuss

GraphFrames 0.2.0 released

2016-08-16 Thread Tim Hunter
Hello all, I have released version 0.2.0 of the GraphFrames package. Apart from a few bug fixes, it is the first release published for Spark 2.0 and both scala 2.10 and 2.11. Please let us know if you have any comment or questions. It is available as a Spark package:

Re: [VOTE] Release Apache Spark 1.6.2 (RC2)

2016-06-22 Thread Tim Hunter
+1 This release passes all tests on the graphframes and tensorframes packages. On Wed, Jun 22, 2016 at 7:19 AM, Cody Koeninger wrote: > If we're considering backporting changes for the 0.8 kafka > integration, I am sure there are people who would like to get > >

Request for comments: Tensorframes, an integration library between TensorFlow and Spark DataFrames

2016-03-19 Thread Tim Hunter
Tim Hunter

Introducing spark-sklearn, a scikit-learn integration package for Spark

2016-02-10 Thread Tim Hunter
Hello community, Joseph and I would like to introduce a new Spark package that should be useful for python users that depend on scikit-learn. Among other tools: - train and evaluate multiple scikit-learn models in parallel. - convert Spark's Dataframes seamlessly into numpy arrays -