Re: how "hour" function in Spark SQL is supposed to work?

2018-03-11 Thread vermanurag
Not sure why you are dividing by 1000. from_unixtime expects a long type which is time in milliseconds from reference date. The following should work: val ds = dataset.withColumn("hour",hour(from_unixtime(dataset.col("ts" -- Sent from:

The last successful batch before stop re-execute after restart the DStreams with checkpoint

2018-03-11 Thread Terry Hoo
Experts, I see the last batch before stop (graceful shutdown) always re-execute after restart the DStream from a checkpoint, is this a expected behavior? I see a bug in JIRA: https://issues.apache.org/jira/browse/SPARK-20050, whic reports duplicates on Kafka, I also see this with HDFS file.

spark sql get result time larger than compute Duration

2018-03-11 Thread wkhapy_1
get result 1.67s compute cost 0.2s below is sql select event_date, dim ,concat_ws('|',collect_list(result)) result from ( select event_day

Re: Spark 2.3 submit on Kubernetes error

2018-03-11 Thread Yinan Li
Spark on Kubernetes requires the presence of the kube-dns add-on properly configured. The executors connect to the driver through a headless Kubernetes service using the DNS name of the service. Can you check if you have the add-on installed in your cluster? This issue

Spark 2.3 submit on Kubernetes error

2018-03-11 Thread purna pradeep
Getting below errors when I’m trying to run spark-submit on k8 cluster *Error 1*:This looks like a warning it doesn’t interrupt the app running inside executor pod but keeps on getting this warning *2018-03-09 11:15:21 WARN WatchConnectionManager:192 - Exec Failure* *

Debugging a local spark executor in pycharm

2018-03-11 Thread Vitaliy Pisarev
I want to step through the work of a spark executor running locally on my machine, from Pycharm. I am running explicit functionality, in the form of dataset.foreachPartition(f) and I want to see what is going on inside f. Is there a straightforward way to do it or do I need to resort to remote

how "hour" function in Spark SQL is supposed to work?

2018-03-11 Thread Serega Sheypak
hi, desperately trying to extract hour from unix seconds year, month, dayofmonth functions work as expected. hour function always returns 0. val ds = dataset .withColumn("year", year(to_date(from_unixtime(dataset.col("ts") / 1000 .withColumn("month",

Error running multinomial regression on a dataset with a field having constant value

2018-03-11 Thread kundan kumar
I am running the sample multinomial regression code given in spark docs (Version 2.2.0) LogisticRegression lr = new LogisticRegression().setMaxIter(100).setRegParam(0.3).setElasticNetParam(0.8); LogisticRegressionModel lrModel = lr.fit(training); But in the dataset I am adding a constant field