date:20180418

Re: [discuss][data source v2] remove type parameter in DataReader/WriterFactory

2018-04-18 Thread Wenchen Fan

First of all, I think we all agree that data source v2 API should at least support InternalRow and ColumnarBatch. With this assumption, the current API has 2 problems: *First problem*: We use mixin traits to add support for different data formats. The mixin traits define API to return

Re: Possible SPIP to improve matrix and vector column type support

2018-04-18 Thread Leif Walsh

I agree we should reuse as much as possible. For PySpark, I think the obvious choices of Breeze and numpy arrays already made make a lot of sense, I’m not sure about the other language bindings and would defer to others. I was under the impression that UDTs were gone and (probably?) not coming

Re: GLM Poisson Model - Deviance calculations

2018-04-18 Thread svattig

Yes i’m referring to that method deviance. It fails when ever y is 0. I think R deviance calculation logic checks if y is 0 and assigns 1 to y for such cases. There are few deviances Like nulldeviance, residualdiviance and deviance that Glm regression summary object has. You might want to check

Re: GLM Poisson Model - Deviance calculations

2018-04-18 Thread Joseph PENG

Are you referring this? override def deviance(y: Double, mu: Double, weight: Double): Double = { 2.0 * weight * (y * math.*log(y / mu)* - (y - mu)) } Not sure how does R handle this, but my guess is they may add a small number, e.g. 0.5, to the numerator and denominator. If you can

Re: [discuss][data source v2] remove type parameter in DataReader/WriterFactory

2018-04-18 Thread Joseph Torres

The fundamental difficulty seems to be that there's a spurious "round-trip" in the API. Spark inspects the source to determine what type it's going to provide, picks an appropriate method according to that type, and then calls that method on the source to finally get what it wants. Pushing this

Re: Possible SPIP to improve matrix and vector column type support

2018-04-18 Thread Joseph Bradley

Thanks for the thoughts! We've gone back and forth quite a bit about local linear algebra support in Spark. For reference, there have been some discussions here: https://issues.apache.org/jira/browse/SPARK-6442 https://issues.apache.org/jira/browse/SPARK-16365

Re: Sort-merge join improvement

2018-04-18 Thread Petar Zecevic

As instructed offline, I opened a JIRA for this: https://issues.apache.org/jira/browse/SPARK-24020 I will create a pull request soon. Le 4/17/2018 à 6:21 PM, Petar Zecevic a écrit : Hello everybody We (at University of Zagreb and University of Washington) have implemented an optimization of

Re: GLM Poisson Model - Deviance calculations

2018-04-18 Thread Sean Owen

GeneralizedLinearRegression.ylogy seems to handle this case; can you be more specific about where the log(0) happens? that's what should be fixed, right? if so, then a JIRA and PR are the right way to proceed. On Wed, Apr 18, 2018 at 2:37 PM svattig wrote: > In

Re: [discuss][data source v2] remove type parameter in DataReader/WriterFactory

2018-04-18 Thread Ryan Blue

Wenchen, can you explain a bit more clearly why this is necessary? The pseudo-code you used doesn’t clearly demonstrate why. Why couldn’t this be handled this with inheritance from an abstract Factory class? Why define all of the createXDataReader methods, but make the DataFormat a field in the

GLM Poisson Model - Deviance calculations

2018-04-18 Thread svattig

In Spark 2.3, When Poisson Model(with labelCol having few counts as 0's) is fit, the Deviance calculations are broken as result of log(0). I think this is the same case as in spark 2.2. But the new toString method in Spark 2.3's GeneralizedLinearRegressionTrainingSummary class is throwing error

Re: [discuss][data source v2] remove type parameter in DataReader/WriterFactory

Re: Possible SPIP to improve matrix and vector column type support

Re: GLM Poisson Model - Deviance calculations

Re: GLM Poisson Model - Deviance calculations

Re: [discuss][data source v2] remove type parameter in DataReader/WriterFactory

Re: Possible SPIP to improve matrix and vector column type support

Re: Sort-merge join improvement

Re: GLM Poisson Model - Deviance calculations

Re: [discuss][data source v2] remove type parameter in DataReader/WriterFactory

GLM Poisson Model - Deviance calculations

10 matches

Site Navigation

Mail list logo

Footer information