Re: trying to understand structured streaming aggregation with watermark and append outputmode

2018-05-29 Thread Koert Kuipers
let me ask this another way: if i run this program and then feed it a single value (on nc), it returns a single result, which is an empty batch. it will not return anything else after that, no matter how long i wait. this only happens with watermarking and append output mode. what do i do to

Re: 答复: GroupBy in Spark / Scala without Agg functions

2018-05-29 Thread Chetan Khatri
I see, Thank you for explanation LInyuxin On Wed, May 30, 2018 at 6:21 AM, Linyuxin wrote: > Hi, > > Why not group by first then join? > > BTW, I don’t think there any difference between ‘distinct’ and ‘group by’ > > > > Source code of 2.1: > > *def *distinct(): Dataset[T] = dropDuplicates() >

答复: GroupBy in Spark / Scala without Agg functions

2018-05-29 Thread Linyuxin
Hi, Why not group by first then join? BTW, I don’t think there any difference between ‘distinct’ and ‘group by’ Source code of 2.1: def distinct(): Dataset[T] = dropDuplicates() … def dropDuplicates(colNames: Seq[String]): Dataset[T] = withTypedPlan { … Aggregate(groupCols, aggCols, logicalPlan)

Re: Pandas UDF for PySpark error. Big Dataset

2018-05-29 Thread Bryan Cutler
Can you share some of the code used, or at least the pandas_udf plus the stacktrace? Also does decreasing your dataset size fix the oom? On Mon, May 28, 2018, 4:22 PM Traku traku wrote: > Hi. > > I'm trying to use the new feature but I can't use it with a big dataset > (about 5 million rows).

Re: Spark 2.3 error on Kubernetes

2018-05-29 Thread Anirudh Ramanathan
Interesting. Perhaps you could try resolving service addresses from within a pod and seeing if there's some other issue causing intermittent failures in resolution. The steps here

Re: Spark 2.3 error on Kubernetes

2018-05-29 Thread purna pradeep
Abirudh, Thanks for your response I’m running k8s cluster on AWS and kub-dns pods are running fine and also as I mentioned only 1 executor pod is running though I requested for 5 and rest 4 were killed with below error and I do have enough resources available. On Tue, May 29, 2018 at 6:28 PM

Re: Spark 2.3 error on Kubernetes

2018-05-29 Thread Anirudh Ramanathan
This looks to me like a kube-dns error that's causing the driver DNS address to not resolve. It would be worth double checking that kube-dns is indeed running (in the kube-system namespace). Often, with environments like minikube, kube-dns may exit/crashloop due to lack of resource. On Tue, May

Spark 2.3 error on Kubernetes

2018-05-29 Thread purna pradeep
Hello, I’m getting below error when I spark-submit a Spark 2.3 app on Kubernetes *v1.8.3* , some of the executor pods were killed with below error as soon as they come up Exception in thread "main" java.lang.reflect.UndeclaredThrowableException at

Spark 2.3 error on kubernetes

2018-05-29 Thread Mamillapalli, Purna Pradeep
Hello, I’m getting below intermittent error when I spark-submit a Spark 2.3 app on Kubernetes v1.8.3 , some of the executor pods were killed with below error as soon as they come up Exception in thread "main" java.lang.reflect.UndeclaredThrowableException at

Spark 2.3 error on kubernetes

2018-05-29 Thread Mamillapalli, Purna Pradeep
Hello, I’m getting below intermittent error when I spark-submit a Spark 2.3 app on Kubernetes v1.8.3 , some of the executor pods were killed with below error as soon as they come up Exception in thread "main" java.lang.reflect.UndeclaredThrowableException at

Re: GroupBy in Spark / Scala without Agg functions

2018-05-29 Thread Chetan Khatri
Georg, Sorry for dumb question. Help me to understand - if i do DF.select(A,B,C,D)*.distinct() *that would be same as above groupBy without agg in sql right ? On Wed, May 30, 2018 at 12:17 AM, Chetan Khatri wrote: > I don't want to get any aggregation, just want to know rather saying > distinct

Re: GroupBy in Spark / Scala without Agg functions

2018-05-29 Thread Chetan Khatri
I don't want to get any aggregation, just want to know rather saying distinct to all columns any other better approach ? On Wed, May 30, 2018 at 12:16 AM, Irving Duran wrote: > Unless you want to get a count, yes. > > Thank You, > > Irving Duran > > > On Tue, May 29, 2018 at 1:44 PM Chetan

Re: GroupBy in Spark / Scala without Agg functions

2018-05-29 Thread Irving Duran
Unless you want to get a count, yes. Thank You, Irving Duran On Tue, May 29, 2018 at 1:44 PM Chetan Khatri wrote: > Georg, I just want to double check that someone wrote MSSQL Server script > where it's groupby all columns. What is alternate best way to do distinct > all columns ? > > > > On

Re: GroupBy in Spark / Scala without Agg functions

2018-05-29 Thread Chetan Khatri
Georg, I just want to double check that someone wrote MSSQL Server script where it's groupby all columns. What is alternate best way to do distinct all columns ? On Wed, May 30, 2018 at 12:08 AM, Georg Heiler wrote: > Why do you group if you do not want to aggregate? > Isn't this the same as

Re: GroupBy in Spark / Scala without Agg functions

2018-05-29 Thread Georg Heiler
Why do you group if you do not want to aggregate? Isn't this the same as select distinct? Chetan Khatri schrieb am Di., 29. Mai 2018 um 20:21 Uhr: > All, > > I have scenario like this in MSSQL Server SQL where i need to do groupBy > without Agg function: > > Pseudocode: > > > select

GroupBy in Spark / Scala without Agg functions

2018-05-29 Thread Chetan Khatri
All, I have scenario like this in MSSQL Server SQL where i need to do groupBy without Agg function: Pseudocode: select m.student_id, m.student_name, m.student_std, m.student_group, m.student_d ob from student as m inner join general_register g on m.student_id = g.student_i d group by

Re: Positive log-likelihood with Gaussian mixture

2018-05-29 Thread Simon Dirmeier
Hey, sorry for the late reply. I cannot share the data but the problem can be reproduced easily, like below. I wanted to check with sklearn and observe a similar behaviour, i.e. a positive per-sample average log-likelihood