Re: [Spark in Kubernetes] Question about running in client mode

2021-04-26 Thread Attila Zsolt Piros
Hi Shiqi, In case of client mode the driver runs locally: in the same machine, even in the same process, of the spark submit. So if the application was submitted in a running POD then the driver will be running in a POD and when outside of K8s then it will be running outside. This is why there

com.google.protobuf.Parser.parseFrom() method Can't use in spark

2021-04-26 Thread null
hi, I have a TensorFlow2 model and need to use it with spark2.4. But I failed to load it in spark(java or scala): scala> import org.{tensorflow => tf} import org.{tensorflow=>tf} scala> val bundle = tf.SavedModelBundle.load("/home/hadoop/xDeepFM","serve") 2021-04-23 07:32:56.223881: I

[Spark in Kubernetes] Question about running in client mode

2021-04-26 Thread Shiqi Sun
Hi Spark User group, I have a couple of quick questions about running Spark in Kubernetes between different deploy modes. As specified in https://spark.apache.org/docs/latest/running-on-kubernetes.html#client-mode, since Spark 2.4, client mode support is available when running in Kubernetes, and

Spark Streaming non functional requirements

2021-04-26 Thread ashok34...@yahoo.com.INVALID
Hello, When we design a typical spark streaming process, the focus is to get functional requirements. However, I have been asked to provide non-functional requirements as well. Likely things I can consider are Fault tolerance and Reliability (component failures).  Are there a standard list of

Re: [Spark-Streaming] moving average on categorical data with time windowing

2021-04-26 Thread Sean Owen
You might be able to do this with multiple aggregations on avg(col("col1") == "cat1") etc, but how about pivoting the DataFrame first so that you get columns like "cat1" being 1 or 0? you would end up with columns x categories new columns if you want to count all categories in all cols. But then

[Spark-Streaming] moving average on categorical data with time windowing

2021-04-26 Thread halil
Hello everyone, I am trying to apply moving average on categorical data like below, which is a synthetic data generated by myself. sqltimestamp,col1,col2,col3,col4,col5 1618574879,cat1,cat4,cat2,cat5,cat3 1618574880,cat1,cat3,cat4,cat2,cat5 1618574881,cat5,cat3,cat4,cat2,cat1

Bintray replacement for spark-packages.org

2021-04-26 Thread Bo Zhang
Hi Apache Spark users, As you might know, Bintray, which is the repository service used for spark-packages.org, is in its sunset process. There was a planned brown-out on April 12th and there will be another one on April 26th

Bintray replacement for spark-packages.org

2021-04-26 Thread Bo Zhang
Hi Apache Spark users, As you might know, Bintray, which is the repository service used for spark-packages.org, is in its sunset process. There was a planned brown-out on April 12th and there will be another one on April 26th

Bintray replacement for spark-packages.org

2021-04-26 Thread Bo Zhang
Hi Apache Spark users, As you might know, Bintray, which is the repository service used for spark-packages.org, is in its sunset process. There was a planned brown-out on April 12th and there will be another one on April 26th