date:20170216

Re: Design document - MLlib's statistical package for DataFrames

2017-02-16 Thread bradc

Hi, While it is also missing in spark.mllib, I'd suggest adding cardinality as part of the Simple descriptive statistics for both spark.ml and spark.mlib? This is useful even for data in double precision FP to understand the "uniqueness" of the feature data. Cheers, Brad -- View this

Re: Spark Improvement Proposals

2017-02-16 Thread Ryan Blue

> [The shepherd] can advise on technical and procedural considerations for people outside the community The sentiment is good, but this doesn't justify requiring a shepherd for a proposal. There are plenty of people that wouldn't need this, would get feedback during discussion, or would ask a

Design document - MLlib's statistical package for DataFrames

2017-02-16 Thread Tim Hunter

Hello all, I have been looking at some of the missing items for complete feature parity between spark.ml and spark.mllib. Here is a proposal for porting mllib.stats, the descriptive statistics package:

Re: Structured Streaming Spark Summit Demo - Databricks people

2017-02-16 Thread Sam Elamin

Thanks Micheal it really was a great demo I figured I needed to add a trigger to display the results. But Buraz from Databricks mentioned here that the display on this functionality wont be

Re: Spark Improvement Proposals

2017-02-16 Thread Sam Elamin

Hi Folks I thought id chime in as someone new to the process so feel free to disregard it if it doesn't make sense. I definitely agree that we need a new forum to identify or discuss changes as JIRA isnt exactly the best place to do that, its a Bug tracker first and foremost. For example I was

Re: Structured Streaming Spark Summit Demo - Databricks people

2017-02-16 Thread Michael Armbrust

Thanks for your interest in Apache Spark Structured Streaming! There is nothing secret in that demo, though I did make some configuration changes in order to get the timing right (gotta have some dramatic effect :) ). Also I think the visualizations based on metrics output by the

Re: [build system] jenkins restart in ~1 hour

2017-02-16 Thread shane knapp

and we're back! :) On Thu, Feb 16, 2017 at 10:22 AM, shane knapp wrote: > we don't have many builds running right now, and i need to restart the > daemon quickly to enable a new plugin. > > i'll wait until the pull request builder jobs are finished and then > (gently) kick

[build system] jenkins restart in ~1 hour

2017-02-16 Thread shane knapp

we don't have many builds running right now, and i need to restart the daemon quickly to enable a new plugin. i'll wait until the pull request builder jobs are finished and then (gently) kick jenkins. updates as they come, shane (who's always nervous about touching this house of cards)

Re: File JIRAs for all flaky test failures

2017-02-16 Thread Reynold Xin

Josh's tool should give enough signal there already. I don't think we need some manual process to document them. If you want to work on those that'd be great. I bet you will get a lot of love because all developers hate flaky tests. On Thu, Feb 16, 2017 at 6:19 PM, Saikat Kanjilal

Re: File JIRAs for all flaky test failures

2017-02-16 Thread Saikat Kanjilal

I am specifically suggesting documenting a list of the the flaky tests and fixing them, that's all. To organize the effort I suggested tackling this by module. Your second sentence is what I was trying to gauge from the community before putting anymore effort into this.

Re: Spark Improvement Proposals

2017-02-16 Thread Cody Koeninger

Reynold, thanks, LGTM. Sean, great concerns. I agree that behavior is largely cultural and writing down a process won't necessarily solve any problems one way or the other. But one outwardly visible change I'm hoping for out of this a way for people who have a stake in Spark, but can't follow

Re: File JIRAs for all flaky test failures

2017-02-16 Thread Sean Owen

I'm not sure what you're specifically suggesting. Of course flaky tests are bad and they should be fixed, and people do. Yes, some are pretty hard to fix because they are rarely reproducible if at all. If you want to fix, fix; there's nothing more to it. I don't perceive flaky tests to be a

Re: Spark Improvement Proposals

2017-02-16 Thread Sean Owen

The text seems fine to me. Really, this is not describing a fundamentally new process, which is good. We've always had JIRAs, we've always been able to call a VOTE for a big question. This just writes down a sensible set of guidelines for putting those two together when a major change is proposed.

Re: Spark Improvement Proposals

2017-02-16 Thread Ryan Blue

The current proposal seems process-heavy to me. That's not necessarily bad, but there are a couple areas I haven't seen discussed. Why is there a shepherd? If the person proposing a change has a good idea, I don't see why one is either a good idea or necessary. The result of this requirement is

Re: File JIRAs for all flaky test failures

2017-02-16 Thread Saikat Kanjilal

I'd just like to follow up again on this thread, should we devote some energy to fixing unit tests based on module, there wasn't much interest in this last time but given the nature of this thread I'd be willing to deep dive into this again with some help.

Re: Spark Job Performance monitoring approaches

2017-02-16 Thread Saikat Kanjilal

There's also this: https://github.com/databricks/spark-perf [https://avatars2.githubusercontent.com/u/4998052?v=3=400] GitHub - databricks/spark-perf: Performance tests for Spark github.com Sweeps sets of

Re: File JIRAs for all flaky test failures

2017-02-16 Thread Reynold Xin

What exactly is the issue? I've been working on Spark dev for a long time and very rarely do I actually run into an issue that only manifest on Jenkins but not locally. I don't have some magic local setup either. We should definitely cut down test flakiness. On Thu, Feb 16, 2017 at 5:26 PM,

Re: Spark Improvement Proposals

2017-02-16 Thread Reynold Xin

Updated. Any feedback from other community members? On Wed, Feb 15, 2017 at 2:53 AM, Cody Koeninger wrote: > Thanks for doing that. > > Given that there are at least 4 different Apache voting processes, > "typical Apache vote process" isn't meaningful to me. > > I think the

Re: Design document - MLlib's statistical package for DataFrames

Re: Spark Improvement Proposals

Design document - MLlib's statistical package for DataFrames

Re: Structured Streaming Spark Summit Demo - Databricks people

Re: Spark Improvement Proposals

Re: Structured Streaming Spark Summit Demo - Databricks people

Re: [build system] jenkins restart in ~1 hour

[build system] jenkins restart in ~1 hour

Re: File JIRAs for all flaky test failures

Re: File JIRAs for all flaky test failures

Re: Spark Improvement Proposals

Re: File JIRAs for all flaky test failures

Re: Spark Improvement Proposals

Re: Spark Improvement Proposals

Re: File JIRAs for all flaky test failures

Re: Spark Job Performance monitoring approaches

Re: File JIRAs for all flaky test failures

Re: Spark Improvement Proposals

18 matches

Site Navigation

Mail list logo

Footer information