date:20180529

Re: trying to understand structured streaming aggregation with watermark and append outputmode

2018-05-29 Thread Koert Kuipers

let me ask this another way: if i run this program and then feed it a
single value (on nc), it returns a single result, which is an empty batch.
it will not return anything else after that, no matter how long i wait.

this only happens with watermarking and append output mode.

what do i do to correct this behavior?


On Mon, May 28, 2018 at 6:16 PM, Koert Kuipers  wrote:

> hello all,
> just playing with structured streaming aggregations for the first time.
> this is my little program i run inside sbt:
>
> import org.apache.spark.sql.functions._
>
> val lines = spark.readStream
>   .format("socket")
>   .option("host", "localhost")
>   .option("port", )
>   .load()
>
> val query = lines
>   .withColumn("time", current_timestamp)
>   .withWatermark("time", "1 second")
>   .groupBy(window($"time", "1 second")).agg(collect_list("value") as
> "value")
>   .withColumn("windowstring", $"window" as "string")
>   .writeStream
>   .format("console")
>   .outputMode(OutputMode.Append)
>   .start()
>
> query.awaitTermination()
>
> before i start it i create a little server with nc:
> $ nc -lk 
>
> after it starts i simply type in a single character every 20 seconds or so
> inside nc and hit enter. my characters are 1, 2, 3, etc.
>
> the thing i dont understand is it comes back with the correct responses,
> but with delays in terms of entries (not time). after the first 2
> characters it comes back with empty aggregations, and then for every next
> character it comes back with the response for 2 characters ago. so when i
> hit 3 it comes back with the response for 1.
>
> not very realtime :(
>
> any idea why?
>
> i would like it to respond to my input 1 with the relevant response
> for that input (after the window and watermark has expired, of course, so
> within 2 seconds).
>
> i tried adding a trigger of 1 second but that didnt help either.
>
> below is the output with my inputs inserted using '<= ', so '<= 1'
> means i hit 1 and then enter.
>
>
> <= 1
> ---
> Batch: 0
> ---
> +--+-++
> |window|value|windowstring|
> +--+-++
> +--+-++
>
> <= 2
> ---
> Batch: 1
> ---
> +--+-++
> |window|value|windowstring|
> +--+-++
> +--+-++
>
> <= 3
> Batch: 2
> ---
> ++-++
> |  window|value|windowstring|
> ++-++
> |[2018-05-28 18:00...|  [1]|[2018-05-28 18:00...|
> ++-++
>
> <= 4
> ---
> Batch: 3
> ---
> ++-++
> |  window|value|windowstring|
> ++-++
> |[2018-05-28 18:00...|  [2]|[2018-05-28 18:00...|
> ++-++
>
> <= 5
> ---
> Batch: 4
> ---
> ++-++
> |  window|value|windowstring|
> ++-++
> |[2018-05-28 18:01...|  [3]|[2018-05-28 18:01...|
> ++-++
>
>
>

Re: 答复: GroupBy in Spark / Scala without Agg functions

2018-05-29 Thread Chetan Khatri

I see, Thank you for explanation LInyuxin

On Wed, May 30, 2018 at 6:21 AM, Linyuxin  wrote:

> Hi,
>
> Why not group by first then join?
>
> BTW, I don’t think there any difference between ‘distinct’ and ‘group by’
>
>
>
> Source code of 2.1:
>
> *def *distinct(): Dataset[T] = dropDuplicates()
>
> …
>
> def dropDuplicates(colNames: Seq[String]): Dataset[T] = withTypedPlan {
>
> …
>
> Aggregate(groupCols, aggCols, logicalPlan)
> }
>
>
>
>
>
>
>
>
>
> *发件人**:* Chetan Khatri [mailto:chetan.opensou...@gmail.com]
> *发送时间:* 2018年5月30日 2:52
> *收件人:* Irving Duran 
> *抄送:* Georg Heiler ; user <
> user@spark.apache.org>
> *主题:* Re: GroupBy in Spark / Scala without Agg functions
>
>
>
> Georg, Sorry for dumb question. Help me to understand - if i do
> DF.select(A,B,C,D)*.distinct() *that would be same as above groupBy
> without agg in sql right ?
>
>
>
> On Wed, May 30, 2018 at 12:17 AM, Chetan Khatri <
> chetan.opensou...@gmail.com> wrote:
>
> I don't want to get any aggregation, just want to know rather saying
> distinct to all columns any other better approach ?
>
>
>
> On Wed, May 30, 2018 at 12:16 AM, Irving Duran 
> wrote:
>
> Unless you want to get a count, yes.
>
>
> Thank You,
>
> Irving Duran
>
>
>
>
>
> On Tue, May 29, 2018 at 1:44 PM Chetan Khatri 
> wrote:
>
> Georg, I just want to double check that someone wrote MSSQL Server script
> where it's groupby all columns. What is alternate best way to do distinct
> all columns ?
>
>
>
>
>
>
>
> On Wed, May 30, 2018 at 12:08 AM, Georg Heiler 
> wrote:
>
> Why do you group if you do not want to aggregate?
>
> Isn't this the same as select distinct?
>
>
>
> Chetan Khatri  schrieb am Di., 29. Mai 2018
> um 20:21 Uhr:
>
> All,
>
>
>
> I have scenario like this in MSSQL Server SQL where i need to do groupBy
> without Agg function:
>
>
>
> Pseudocode:
>
>
>
>
>
> select m.student_id, m.student_name, m.student_std, m.student_group,
> m.student_d
>
> ob from student as m inner join general_register g on m.student_id =
> g.student_i
>
> d group by m.student_id, m.student_name, m.student_std, m.student_group,
> m.student_dob
>
>
>
> I tried to doing in spark but i am not able to get Dataframe as return
> value, how this kind of things could be done in Spark.
>
>
>
> Thanks
>
>
>
>
>
>
>

答复: GroupBy in Spark / Scala without Agg functions

2018-05-29 Thread Linyuxin

Hi,
Why not group by first then join?
BTW, I don’t think there any difference between ‘distinct’ and ‘group by’

Source code of 2.1:
def distinct(): Dataset[T] = dropDuplicates()
…
def dropDuplicates(colNames: Seq[String]): Dataset[T] = withTypedPlan {
…
Aggregate(groupCols, aggCols, logicalPlan)
}




发件人: Chetan Khatri [mailto:chetan.opensou...@gmail.com]
发送时间: 2018年5月30日 2:52
收件人: Irving Duran 
抄送: Georg Heiler ; user 
主题: Re: GroupBy in Spark / Scala without Agg functions

Georg, Sorry for dumb question. Help me to understand - if i do 
DF.select(A,B,C,D).distinct() that would be same as above groupBy without agg 
in sql right ?

On Wed, May 30, 2018 at 12:17 AM, Chetan Khatri 
mailto:chetan.opensou...@gmail.com>> wrote:
I don't want to get any aggregation, just want to know rather saying distinct 
to all columns any other better approach ?

On Wed, May 30, 2018 at 12:16 AM, Irving Duran 
mailto:irving.du...@gmail.com>> wrote:
Unless you want to get a count, yes.

Thank You,

Irving Duran


On Tue, May 29, 2018 at 1:44 PM Chetan Khatri 
mailto:chetan.opensou...@gmail.com>> wrote:
Georg, I just want to double check that someone wrote MSSQL Server script where 
it's groupby all columns. What is alternate best way to do distinct all columns 
?



On Wed, May 30, 2018 at 12:08 AM, Georg Heiler 
mailto:georg.kf.hei...@gmail.com>> wrote:
Why do you group if you do not want to aggregate?
Isn't this the same as select distinct?

Chetan Khatri mailto:chetan.opensou...@gmail.com>> 
schrieb am Di., 29. Mai 2018 um 20:21 Uhr:
All,

I have scenario like this in MSSQL Server SQL where i need to do groupBy 
without Agg function:

Pseudocode:


select m.student_id, m.student_name, m.student_std, m.student_group, m.student_d
ob from student as m inner join general_register g on m.student_id = g.student_i
d group by m.student_id, m.student_name, m.student_std, m.student_group, 
m.student_dob

I tried to doing in spark but i am not able to get Dataframe as return value, 
how this kind of things could be done in Spark.

Thanks

Re: Pandas UDF for PySpark error. Big Dataset

2018-05-29 Thread Bryan Cutler

Can you share some of the code used, or at least the pandas_udf plus the
stacktrace?  Also does decreasing your dataset size fix the oom?

On Mon, May 28, 2018, 4:22 PM Traku traku  wrote:

> Hi.
>
> I'm trying to use the new feature but I can't use it with a big dataset
> (about 5 million rows).
>
> I tried  increasing executor memory, driver memory, partition number, but
> any solution can help me to solve the problem.
>
> One of the executor task increase the shufle memory until fails.
>
> Error is arrow generated: unable to expand the buffer.
>
> Any idea?
>

Re: Spark 2.3 error on Kubernetes

2018-05-29 Thread Anirudh Ramanathan

Interesting.
Perhaps you could try resolving service addresses from within a pod and
seeing if there's some other issue causing intermittent failures in
resolution.
The steps here

may
be helpful.

On Tue, May 29, 2018 at 4:02 PM, purna pradeep 
wrote:

> Abirudh,
>
> Thanks for your response
>
> I’m running k8s cluster on AWS and kub-dns pods are running fine and also
> as I mentioned only 1 executor pod is running though I requested for 5 and
> rest 4 were killed with below error and I do have enough resources
> available.
>
> On Tue, May 29, 2018 at 6:28 PM Anirudh Ramanathan 
> wrote:
>
>> This looks to me like a kube-dns error that's causing the driver DNS
>> address to not resolve.
>> It would be worth double checking that kube-dns is indeed running (in the
>> kube-system namespace).
>> Often, with environments like minikube, kube-dns may exit/crashloop due
>> to lack of resource.
>>
>> On Tue, May 29, 2018 at 3:18 PM, purna pradeep 
>> wrote:
>>
>>> Hello,
>>>
>>> I’m getting below  error when I spark-submit a Spark 2.3 app on
>>> Kubernetes *v1.8.3* , some of the executor pods  were killed with below
>>> error as soon as they come up
>>>
>>> Exception in thread "main" java.lang.reflect.
>>> UndeclaredThrowableException
>>>
>>> at org.apache.hadoop.security.UserGroupInformation.doAs(
>>> UserGroupInformation.java:1713)
>>>
>>> at org.apache.spark.deploy.SparkHadoopUtil.
>>> runAsSparkUser(SparkHadoopUtil.scala:64)
>>>
>>> at org.apache.spark.executor.
>>> CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.
>>> scala:188)
>>>
>>> at org.apache.spark.executor.
>>> CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.
>>> scala:293)
>>>
>>> at org.apache.spark.executor.
>>> CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
>>>
>>> Caused by: org.apache.spark.SparkException: Exception thrown in
>>> awaitResult:
>>>
>>> at org.apache.spark.util.ThreadUtils$.awaitResult(
>>> ThreadUtils.scala:205)
>>>
>>> at org.apache.spark.rpc.RpcTimeout.awaitResult(
>>> RpcTimeout.scala:75)
>>>
>>> at org.apache.spark.rpc.RpcEnv.
>>> setupEndpointRefByURI(RpcEnv.scala:101)
>>>
>>> at org.apache.spark.executor.
>>> CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(
>>> CoarseGrainedExecutorBackend.scala:201)
>>>
>>> at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(
>>> SparkHadoopUtil.scala:65)
>>>
>>> at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(
>>> SparkHadoopUtil.scala:64)
>>>
>>> at java.security.AccessController.doPrivileged(Native
>>> Method)
>>>
>>> at javax.security.auth.Subject.doAs(Subject.java:422)
>>>
>>> at org.apache.hadoop.security.UserGroupInformation.doAs(
>>> UserGroupInformation.java:1698)
>>>
>>> ... 4 more
>>>
>>> Caused by: java.io.IOException: Failed to connect to
>>> spark-1527629824987-driver-svc.spark.svc:7078
>>>
>>> at org.apache.spark.network.
>>> client.TransportClientFactory.createClient(TransportClientFactory.java:
>>> 245)
>>>
>>> at org.apache.spark.network.
>>> client.TransportClientFactory.createClient(TransportClientFactory.java:
>>> 187)
>>>
>>> at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(
>>> NettyRpcEnv.scala:198)
>>>
>>> at org.apache.spark.rpc.netty.
>>> Outbox$$anon$1.call(Outbox.scala:194)
>>>
>>> at org.apache.spark.rpc.netty.
>>> Outbox$$anon$1.call(Outbox.scala:190)
>>>
>>> at java.util.concurrent.FutureTask.run(FutureTask.
>>> java:266)
>>>
>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(
>>> ThreadPoolExecutor.java:1149)
>>>
>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>> ThreadPoolExecutor.java:624)
>>>
>>> at java.lang.Thread.run(Thread.java:748)
>>>
>>> Caused by: java.net.UnknownHostException: spark-1527629824987-driver-
>>> svc.spark.svc
>>>
>>> at java.net.InetAddress.getAllByName0(InetAddress.
>>> java:1280)
>>>
>>> at java.net.InetAddress.getAllByName(InetAddress.java:
>>> 1192)
>>>
>>> at java.net.InetAddress.getAllByName(InetAddress.java:
>>> 1126)
>>>
>>> at java.net.InetAddress.getByName(InetAddress.java:1076)
>>>
>>> at io.netty.util.internal.SocketUtils$8.run(SocketUtils.
>>> java:146)
>>>
>>> at io.netty.util.internal.SocketUtils$8.run(SocketUtils.
>>> java:143)
>>>
>>> at java.security.AccessController.doPrivileged(Native
>>> Method)
>>>
>>> at io.netty.util.internal.SocketUtils.addressByName(
>>> SocketUtils.java:143)
>>>
>>>

Re: Spark 2.3 error on Kubernetes

2018-05-29 Thread purna pradeep

Abirudh,

Thanks for your response

I’m running k8s cluster on AWS and kub-dns pods are running fine and also
as I mentioned only 1 executor pod is running though I requested for 5 and
rest 4 were killed with below error and I do have enough resources
available.

On Tue, May 29, 2018 at 6:28 PM Anirudh Ramanathan 
wrote:

> This looks to me like a kube-dns error that's causing the driver DNS
> address to not resolve.
> It would be worth double checking that kube-dns is indeed running (in the
> kube-system namespace).
> Often, with environments like minikube, kube-dns may exit/crashloop due to
> lack of resource.
>
> On Tue, May 29, 2018 at 3:18 PM, purna pradeep 
> wrote:
>
>> Hello,
>>
>> I’m getting below  error when I spark-submit a Spark 2.3 app on
>> Kubernetes *v1.8.3* , some of the executor pods  were killed with below
>> error as soon as they come up
>>
>> Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
>>
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1713)
>>
>> at
>> org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:64)
>>
>> at
>> org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:188)
>>
>> at
>> org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:293)
>>
>> at
>> org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
>>
>> Caused by: org.apache.spark.SparkException: Exception thrown in
>> awaitResult:
>>
>> at
>> org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
>>
>> at
>> org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
>>
>> at
>> org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
>>
>> at
>> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:201)
>>
>> at
>> org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:65)
>>
>> at
>> org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:64)
>>
>> at java.security.AccessController.doPrivileged(Native
>> Method)
>>
>> at javax.security.auth.Subject.doAs(Subject.java:422)
>>
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>>
>> ... 4 more
>>
>> Caused by: java.io.IOException: Failed to connect to
>> spark-1527629824987-driver-svc.spark.svc:7078
>>
>> at
>> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245)
>>
>> at
>> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187)
>>
>> at
>> org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:198)
>>
>> at
>> org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194)
>>
>> at
>> org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190)
>>
>> at
>> java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>
>> at java.lang.Thread.run(Thread.java:748)
>>
>> Caused by: java.net.UnknownHostException:
>> spark-1527629824987-driver-svc.spark.svc
>>
>> at
>> java.net.InetAddress.getAllByName0(InetAddress.java:1280)
>>
>> at
>> java.net.InetAddress.getAllByName(InetAddress.java:1192)
>>
>> at
>> java.net.InetAddress.getAllByName(InetAddress.java:1126)
>>
>> at java.net.InetAddress.getByName(InetAddress.java:1076)
>>
>> at
>> io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:146)
>>
>> at
>> io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:143)
>>
>> at java.security.AccessController.doPrivileged(Native
>> Method)
>>
>> at
>> io.netty.util.internal.SocketUtils.addressByName(SocketUtils.java:143)
>>
>> at
>> io.netty.resolver.DefaultNameResolver.doResolve(DefaultNameResolver.java:43)
>>
>> at
>> io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:63)
>>
>> at
>> io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:55)
>>
>> at
>> io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:57)
>>
>> at
>> io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:32)
>>
>> at
>>

Re: Spark 2.3 error on Kubernetes

2018-05-29 Thread Anirudh Ramanathan

This looks to me like a kube-dns error that's causing the driver DNS
address to not resolve.
It would be worth double checking that kube-dns is indeed running (in the
kube-system namespace).
Often, with environments like minikube, kube-dns may exit/crashloop due to
lack of resource.

On Tue, May 29, 2018 at 3:18 PM, purna pradeep 
wrote:

> Hello,
>
> I’m getting below  error when I spark-submit a Spark 2.3 app on Kubernetes
> *v1.8.3* , some of the executor pods  were killed with below error as
> soon as they come up
>
> Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
>
> at org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1713)
>
> at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(
> SparkHadoopUtil.scala:64)
>
> at org.apache.spark.executor.
> CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:188)
>
> at org.apache.spark.executor.
> CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:293)
>
> at org.apache.spark.executor.CoarseGrainedExecutorBackend.
> main(CoarseGrainedExecutorBackend.scala)
>
> Caused by: org.apache.spark.SparkException: Exception thrown in
> awaitResult:
>
> at org.apache.spark.util.ThreadUtils$.awaitResult(
> ThreadUtils.scala:205)
>
> at org.apache.spark.rpc.RpcTimeout.awaitResult(
> RpcTimeout.scala:75)
>
> at org.apache.spark.rpc.RpcEnv.
> setupEndpointRefByURI(RpcEnv.scala:101)
>
> at org.apache.spark.executor.
> CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(
> CoarseGrainedExecutorBackend.scala:201)
>
> at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(
> SparkHadoopUtil.scala:65)
>
> at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(
> SparkHadoopUtil.scala:64)
>
> at java.security.AccessController.doPrivileged(Native
> Method)
>
> at javax.security.auth.Subject.doAs(Subject.java:422)
>
> at org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1698)
>
> ... 4 more
>
> Caused by: java.io.IOException: Failed to connect to
> spark-1527629824987-driver-svc.spark.svc:7078
>
> at org.apache.spark.network.client.TransportClientFactory.
> createClient(TransportClientFactory.java:245)
>
> at org.apache.spark.network.client.TransportClientFactory.
> createClient(TransportClientFactory.java:187)
>
> at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(
> NettyRpcEnv.scala:198)
>
> at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.
> scala:194)
>
> at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.
> scala:190)
>
> at java.util.concurrent.FutureTask.run(FutureTask.
> java:266)
>
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1149)
>
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:624)
>
> at java.lang.Thread.run(Thread.java:748)
>
> Caused by: java.net.UnknownHostException: spark-1527629824987-driver-
> svc.spark.svc
>
> at java.net.InetAddress.getAllByName0(InetAddress.
> java:1280)
>
> at java.net.InetAddress.getAllByName(InetAddress.java:
> 1192)
>
> at java.net.InetAddress.getAllByName(InetAddress.java:
> 1126)
>
> at java.net.InetAddress.getByName(InetAddress.java:1076)
>
> at io.netty.util.internal.SocketUtils$8.run(SocketUtils.
> java:146)
>
> at io.netty.util.internal.SocketUtils$8.run(SocketUtils.
> java:143)
>
> at java.security.AccessController.doPrivileged(Native
> Method)
>
> at io.netty.util.internal.SocketUtils.addressByName(
> SocketUtils.java:143)
>
> at io.netty.resolver.DefaultNameResolver.doResolve(
> DefaultNameResolver.java:43)
>
> at io.netty.resolver.SimpleNameResolver.resolve(
> SimpleNameResolver.java:63)
>
> at io.netty.resolver.SimpleNameResolver.resolve(
> SimpleNameResolver.java:55)
>
> at io.netty.resolver.InetSocketAddressResolver.doResolve(
> InetSocketAddressResolver.java:57)
>
> at io.netty.resolver.InetSocketAddressResolver.doResolve(
> InetSocketAddressResolver.java:32)
>
> at io.netty.resolver.AbstractAddressResolver.resolve(
> AbstractAddressResolver.java:108)
>
> at io.netty.bootstrap.Bootstrap.doResolveAndConnect0(
> Bootstrap.java:208)
>
> at io.netty.bootstrap.Bootstrap.
> access$000(Bootstrap.java:49)
>
> at io.netty.bootstrap.Bootstrap$
> 1.operationComplete(Bootstrap.java:188)
>
> at io.netty.bootstrap.Bootstrap$
>

Spark 2.3 error on Kubernetes

2018-05-29 Thread purna pradeep

Hello,

I’m getting below  error when I spark-submit a Spark 2.3 app on Kubernetes
*v1.8.3* , some of the executor pods  were killed with below error as soon
as they come up

Exception in thread "main" java.lang.reflect.UndeclaredThrowableException

at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1713)

at
org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:64)

at
org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:188)

at
org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:293)

at
org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)

Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult:

at
org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)

at
org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)

at
org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)

at
org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:201)

at
org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:65)

at
org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:64)

at java.security.AccessController.doPrivileged(Native
Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)

... 4 more

Caused by: java.io.IOException: Failed to connect to
spark-1527629824987-driver-svc.spark.svc:7078

at
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245)

at
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187)

at
org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:198)

at
org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194)

at
org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)

Caused by: java.net.UnknownHostException:
spark-1527629824987-driver-svc.spark.svc

at java.net.InetAddress.getAllByName0(InetAddress.java:1280)

at java.net.InetAddress.getAllByName(InetAddress.java:1192)

at java.net.InetAddress.getAllByName(InetAddress.java:1126)

at java.net.InetAddress.getByName(InetAddress.java:1076)

at
io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:146)

at
io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:143)

at java.security.AccessController.doPrivileged(Native
Method)

at
io.netty.util.internal.SocketUtils.addressByName(SocketUtils.java:143)

at
io.netty.resolver.DefaultNameResolver.doResolve(DefaultNameResolver.java:43)

at
io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:63)

at
io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:55)

at
io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:57)

at
io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:32)

at
io.netty.resolver.AbstractAddressResolver.resolve(AbstractAddressResolver.java:108)

at
io.netty.bootstrap.Bootstrap.doResolveAndConnect0(Bootstrap.java:208)

at
io.netty.bootstrap.Bootstrap.access$000(Bootstrap.java:49)

at
io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:188)

at
io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:174)

at
io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)

at
io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481)

at
io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420)

at
io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104)

at
io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:82)

at

Spark 2.3 error on kubernetes

2018-05-29 Thread Mamillapalli, Purna Pradeep

Hello,


I’m getting below intermittent error when I spark-submit a Spark 2.3 app on 
Kubernetes v1.8.3 , some of the executor pods  were killed with below error as 
soon as they come up


Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1713)
at 
org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:64)
at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:188)
at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:293)
at 
org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult:
at 
org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
at 
org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at 
org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:201)
at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:65)
at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:64)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
... 4 more
Caused by: java.io.IOException: Failed to connect to 
spark-1527629824987-driver-svc.spark.svc:7078
at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245)
at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187)
at 
org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:198)
at 
org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194)
at 
org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.UnknownHostException: 
spark-1527629824987-driver-svc.spark.svc
at java.net.InetAddress.getAllByName0(InetAddress.java:1280)
at java.net.InetAddress.getAllByName(InetAddress.java:1192)
at java.net.InetAddress.getAllByName(InetAddress.java:1126)
at java.net.InetAddress.getByName(InetAddress.java:1076)
at 
io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:146)
at 
io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:143)
at java.security.AccessController.doPrivileged(Native Method)
at 
io.netty.util.internal.SocketUtils.addressByName(SocketUtils.java:143)
at 
io.netty.resolver.DefaultNameResolver.doResolve(DefaultNameResolver.java:43)
at 
io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:63)
at 
io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:55)
at 
io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:57)
at 
io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:32)
at 
io.netty.resolver.AbstractAddressResolver.resolve(AbstractAddressResolver.java:108)
at 
io.netty.bootstrap.Bootstrap.doResolveAndConnect0(Bootstrap.java:208)
at io.netty.bootstrap.Bootstrap.access$000(Bootstrap.java:49)
at 
io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:188)
at 
io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:174)
at 
io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
at 
io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481)
at 
io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420)
at 
io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104)
at 
io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:82)
at

Spark 2.3 error on kubernetes

2018-05-29 Thread Mamillapalli, Purna Pradeep

Hello,


I’m getting below intermittent error when I spark-submit a Spark 2.3 app on 
Kubernetes v1.8.3 , some of the executor pods  were killed with below error as 
soon as they come up


Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1713)
at 
org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:64)
at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:188)
at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:293)
at 
org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult:
at 
org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
at 
org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at 
org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:201)
at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:65)
at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:64)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
... 4 more
Caused by: java.io.IOException: Failed to connect to 
spark-1527629824987-driver-svc.spark.svc:7078
at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245)
at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187)
at 
org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:198)
at 
org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194)
at 
org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.UnknownHostException: 
spark-1527629824987-driver-svc.spark.svc
at java.net.InetAddress.getAllByName0(InetAddress.java:1280)
at java.net.InetAddress.getAllByName(InetAddress.java:1192)
at java.net.InetAddress.getAllByName(InetAddress.java:1126)
at java.net.InetAddress.getByName(InetAddress.java:1076)
at 
io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:146)
at 
io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:143)
at java.security.AccessController.doPrivileged(Native Method)
at 
io.netty.util.internal.SocketUtils.addressByName(SocketUtils.java:143)
at 
io.netty.resolver.DefaultNameResolver.doResolve(DefaultNameResolver.java:43)
at 
io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:63)
at 
io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:55)
at 
io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:57)
at 
io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:32)
at 
io.netty.resolver.AbstractAddressResolver.resolve(AbstractAddressResolver.java:108)
at 
io.netty.bootstrap.Bootstrap.doResolveAndConnect0(Bootstrap.java:208)
at io.netty.bootstrap.Bootstrap.access$000(Bootstrap.java:49)
at 
io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:188)
at 
io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:174)
at 
io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
at 
io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481)
at 
io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420)
at 
io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104)
at 
io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:82)
at

Re: GroupBy in Spark / Scala without Agg functions

2018-05-29 Thread Chetan Khatri

Georg, Sorry for dumb question. Help me to understand - if i do
DF.select(A,B,C,D)*.distinct() *that would be same as above groupBy without
agg in sql right ?

On Wed, May 30, 2018 at 12:17 AM, Chetan Khatri  wrote:

> I don't want to get any aggregation, just want to know rather saying
> distinct to all columns any other better approach ?
>
> On Wed, May 30, 2018 at 12:16 AM, Irving Duran 
> wrote:
>
>> Unless you want to get a count, yes.
>>
>> Thank You,
>>
>> Irving Duran
>>
>>
>> On Tue, May 29, 2018 at 1:44 PM Chetan Khatri <
>> chetan.opensou...@gmail.com> wrote:
>>
>>> Georg, I just want to double check that someone wrote MSSQL Server
>>> script where it's groupby all columns. What is alternate best way to do
>>> distinct all columns ?
>>>
>>>
>>>
>>> On Wed, May 30, 2018 at 12:08 AM, Georg Heiler <
>>> georg.kf.hei...@gmail.com> wrote:
>>>
 Why do you group if you do not want to aggregate?
 Isn't this the same as select distinct?

 Chetan Khatri  schrieb am Di., 29. Mai
 2018 um 20:21 Uhr:

> All,
>
> I have scenario like this in MSSQL Server SQL where i need to do
> groupBy without Agg function:
>
> Pseudocode:
>
>
> select m.student_id, m.student_name, m.student_std, m.student_group,
> m.student_d
> ob from student as m inner join general_register g on m.student_id =
> g.student_i
> d group by m.student_id, m.student_name, m.student_std,
> m.student_group, m.student_dob
>
> I tried to doing in spark but i am not able to get Dataframe as return
> value, how this kind of things could be done in Spark.
>
> Thanks
>

>>>
>

Re: GroupBy in Spark / Scala without Agg functions

2018-05-29 Thread Chetan Khatri

I don't want to get any aggregation, just want to know rather saying
distinct to all columns any other better approach ?

On Wed, May 30, 2018 at 12:16 AM, Irving Duran 
wrote:

> Unless you want to get a count, yes.
>
> Thank You,
>
> Irving Duran
>
>
> On Tue, May 29, 2018 at 1:44 PM Chetan Khatri 
> wrote:
>
>> Georg, I just want to double check that someone wrote MSSQL Server script
>> where it's groupby all columns. What is alternate best way to do distinct
>> all columns ?
>>
>>
>>
>> On Wed, May 30, 2018 at 12:08 AM, Georg Heiler > > wrote:
>>
>>> Why do you group if you do not want to aggregate?
>>> Isn't this the same as select distinct?
>>>
>>> Chetan Khatri  schrieb am Di., 29. Mai
>>> 2018 um 20:21 Uhr:
>>>
 All,

 I have scenario like this in MSSQL Server SQL where i need to do
 groupBy without Agg function:

 Pseudocode:


 select m.student_id, m.student_name, m.student_std, m.student_group,
 m.student_d
 ob from student as m inner join general_register g on m.student_id =
 g.student_i
 d group by m.student_id, m.student_name, m.student_std,
 m.student_group, m.student_dob

 I tried to doing in spark but i am not able to get Dataframe as return
 value, how this kind of things could be done in Spark.

 Thanks

>>>
>>

Re: GroupBy in Spark / Scala without Agg functions

2018-05-29 Thread Irving Duran

Unless you want to get a count, yes.

Thank You,

Irving Duran


On Tue, May 29, 2018 at 1:44 PM Chetan Khatri 
wrote:

> Georg, I just want to double check that someone wrote MSSQL Server script
> where it's groupby all columns. What is alternate best way to do distinct
> all columns ?
>
>
>
> On Wed, May 30, 2018 at 12:08 AM, Georg Heiler 
> wrote:
>
>> Why do you group if you do not want to aggregate?
>> Isn't this the same as select distinct?
>>
>> Chetan Khatri  schrieb am Di., 29. Mai 2018
>> um 20:21 Uhr:
>>
>>> All,
>>>
>>> I have scenario like this in MSSQL Server SQL where i need to do groupBy
>>> without Agg function:
>>>
>>> Pseudocode:
>>>
>>>
>>> select m.student_id, m.student_name, m.student_std, m.student_group,
>>> m.student_d
>>> ob from student as m inner join general_register g on m.student_id =
>>> g.student_i
>>> d group by m.student_id, m.student_name, m.student_std, m.student_group,
>>> m.student_dob
>>>
>>> I tried to doing in spark but i am not able to get Dataframe as return
>>> value, how this kind of things could be done in Spark.
>>>
>>> Thanks
>>>
>>
>

Re: GroupBy in Spark / Scala without Agg functions

2018-05-29 Thread Chetan Khatri

Georg, I just want to double check that someone wrote MSSQL Server script
where it's groupby all columns. What is alternate best way to do distinct
all columns ?



On Wed, May 30, 2018 at 12:08 AM, Georg Heiler 
wrote:

> Why do you group if you do not want to aggregate?
> Isn't this the same as select distinct?
>
> Chetan Khatri  schrieb am Di., 29. Mai 2018
> um 20:21 Uhr:
>
>> All,
>>
>> I have scenario like this in MSSQL Server SQL where i need to do groupBy
>> without Agg function:
>>
>> Pseudocode:
>>
>>
>> select m.student_id, m.student_name, m.student_std, m.student_group,
>> m.student_d
>> ob from student as m inner join general_register g on m.student_id =
>> g.student_i
>> d group by m.student_id, m.student_name, m.student_std, m.student_group,
>> m.student_dob
>>
>> I tried to doing in spark but i am not able to get Dataframe as return
>> value, how this kind of things could be done in Spark.
>>
>> Thanks
>>
>

Re: GroupBy in Spark / Scala without Agg functions

2018-05-29 Thread Georg Heiler

Why do you group if you do not want to aggregate?
Isn't this the same as select distinct?

Chetan Khatri  schrieb am Di., 29. Mai 2018 um
20:21 Uhr:

> All,
>
> I have scenario like this in MSSQL Server SQL where i need to do groupBy
> without Agg function:
>
> Pseudocode:
>
>
> select m.student_id, m.student_name, m.student_std, m.student_group,
> m.student_d
> ob from student as m inner join general_register g on m.student_id =
> g.student_i
> d group by m.student_id, m.student_name, m.student_std, m.student_group,
> m.student_dob
>
> I tried to doing in spark but i am not able to get Dataframe as return
> value, how this kind of things could be done in Spark.
>
> Thanks
>

GroupBy in Spark / Scala without Agg functions

2018-05-29 Thread Chetan Khatri

All,

I have scenario like this in MSSQL Server SQL where i need to do groupBy
without Agg function:

Pseudocode:


select m.student_id, m.student_name, m.student_std, m.student_group,
m.student_d
ob from student as m inner join general_register g on m.student_id =
g.student_i
d group by m.student_id, m.student_name, m.student_std, m.student_group,
m.student_dob

I tried to doing in spark but i am not able to get Dataframe as return
value, how this kind of things could be done in Spark.

Thanks

Re: Positive log-likelihood with Gaussian mixture

2018-05-29 Thread Simon Dirmeier


Hey,

sorry for the late reply. I cannot share the data but the problem can be 
reproduced easily, like below.
I wanted to check with sklearn and observe a similar behaviour, i.e. a 
positive per-sample average log-likelihood 
(http://scikit-learn.org/stable/modules/generated/sklearn.mixture.GaussianMixture.html#sklearn.mixture.GaussianMixture.score).


I don't think it is necessarily an issue with the implementation, but 
maybe due to parameter identifiability or so?

As far as I can tell, the variances seem to be ok.

Thanks for looking into this.

Best,
Simon
/


import scipy//
import sklearn.mixture//
from scipy.stats import multivariate_normal//
from sklearn.mixture import GaussianMixture//
//
//scipy.random.seed(23)//
//X = multivariate_normal.rvs(mean=scipy.ones(10), size=100)//
//
//dff = map(lambda x: (int(x[0]), Vectors.dense(x[0:])), X)//
//df = spark.createDataFrame(dff, schema=["label", "features"])//
//
/
/for i in [100, 90, 80, 70, 60, 50]://
//    km = pyspark.ml.clustering.GaussianMixture(k=10, 
seed=23).fit(df.limit(i))//
//    sk_gmm = sklearn.mixture.GaussianMixture(10, 
random_state=23).fit(X[:i, :])//
//    print(df.limit(i).count(), X[:i, :].shape[0], 
km.summary.logLikelihood, sk_gmm.score(X[:i, :]))//

//
/

/100 100 368.37475644171036 -1.54949312502 90 90 1026.084529101155 
1.16196607062 80 80 2245.427539835042 4.25769131857 70 70 
1940.0122633489268 10.0949992881 60 60 2255.002313247103 14.0497823725 
50 50 -140.82605873444814 21.2423016046/

Re: trying to understand structured streaming aggregation with watermark and append outputmode

Re: 答复: GroupBy in Spark / Scala without Agg functions

答复: GroupBy in Spark / Scala without Agg functions

Re: Pandas UDF for PySpark error. Big Dataset

Re: Spark 2.3 error on Kubernetes

Re: Spark 2.3 error on Kubernetes

Re: Spark 2.3 error on Kubernetes

Spark 2.3 error on Kubernetes

Spark 2.3 error on kubernetes

Spark 2.3 error on kubernetes

Re: GroupBy in Spark / Scala without Agg functions

Re: GroupBy in Spark / Scala without Agg functions

Re: GroupBy in Spark / Scala without Agg functions

Re: GroupBy in Spark / Scala without Agg functions

Re: GroupBy in Spark / Scala without Agg functions

GroupBy in Spark / Scala without Agg functions

Re: Positive log-likelihood with Gaussian mixture

17 matches

Site Navigation

Mail list logo

Footer information