Pyspark elementwise matrix multiplication

2019-02-08 Thread Simon Dirmeier
Dear all, I wonder if there is a way to take the elementwise-product of 2 matrices (RowMatrix, DistributedMatrix, ..) in pyspark? I cannot find a good answer/API entry on the topic. Thank you for all the help. Best, Simon

Element-wise multiplication in Pyspark

2019-02-08 Thread Simon Dirmeier
Dear all, is there a way to take the elementwise-product of 2 matrices in pyspark, e.g. RowMatrix, DistributedMatrix? I cannot find a good answer/API entry? Thanks for all the help. Best, Simon - To unsubscribe e-mail:

P-values logistic regression

2019-01-09 Thread Simon Dirmeier
Dear all, when fitting a logistic regression model, for some data no p-values are computed. I cannot really tell under what circumstances this happpens though.Is there an explanation why and when this might be the case? Thank you, Simon

p-values logistic regression

2018-12-30 Thread Simon Dirmeier
Dear all, when fitting a logistic model in pyspark (https://spark.apache.org/docs/2.2.0/ml-classification-regression.html#binomial-logistic-regression) in many cases, the summary does not contain p-values, or rather calling the summary throws an exception (even though in these cases

Re: Positive log-likelihood with Gaussian mixture

2018-05-30 Thread Simon Dirmeier
tps://polymail.io/?utm_source=polymail_medium=referral_campaign=signature> On Tue, 29 May 2018 at 12:08 Simon Dirmeier <mailto:simon%20dirmeier%20%3csimon.dirme...@web.de%3E>> wrote: Hey, sorry for the late reply. I cannot share the data but the problem can be reproduced eas

Re: Positive log-likelihood with Gaussian mixture

2018-05-29 Thread Simon Dirmeier
Hey, sorry for the late reply. I cannot share the data but the problem can be reproduced easily, like below. I wanted to check with sklearn and observe a similar behaviour, i.e. a positive per-sample average log-likelihood

Positive log-likelihood with Gaussian mixture

2018-05-24 Thread Simon Dirmeier
Dear all, I am fitting a very trivial GMM with 2-10 components on 100 samples and 5 features in pyspark and observe some of the log-likelihoods being positive (see below). I don't undestand how this is possible. Is this a bug or an intended behaviour? Furthermore, for different seeds,

Re: Zero Coefficient in logistic regression

2017-10-24 Thread Simon Dirmeier
-squared statistic only used in categorical features. It looks not proper here. Thanks! On Tue, Oct 24, 2017 at 5:13 PM, Simon Dirmeier <simon.dirme...@web.de <mailto:simon.dirme...@web.de>> wrote: Hey, as far as I know feature selection using the a chi-squared statistic,

Re: Zero Coefficient in logistic regression

2017-10-24 Thread Simon Dirmeier
Hey, as far as I know feature selection using the a chi-squared statistic, can only be done on categorical features and not on possibly continuous ones? Furthermore, since your logistic model doesn't use any regularization, you should be fine here. So I'd check the ChiSqSeletor and possibly

Pyspark define UDF for windows

2017-09-20 Thread Simon Dirmeier
Dear all, I am trying to partition a DataFrame into windows and then for every column and window use a custom function (udf) using Spark's Python interface. Within that function I cast a column of a window in a m x n matrix to do a median-polish and afterwards return a list again. This