Hi
Spark 2.0 doesn't support stored by . Is there any alternative to achieve
the same.
Hi All,
DataWorks Summit, San Jose, 2018 is a good place to share your experience of
advanced analytics, data science, machine learning and deep learning.
We have Artificial Intelligence and Data Science session, to cover technologies
such as:
Apache Spark, Sciki-learn, TensorFlow, Keras,
unsubscribe
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Hello,
We have a Spark cluster with 3 worker nodes available as EC2 on AWS. Spark
application is running in cluster mode and the checkpoints are stored in EFS.
Spark version used is 2.2.0.
We noticed the below error coming up – our understanding was that this
intermittent checkpoint issue
The other way might be to launch a single SparkContext and then run jobs
inside of it.
You can take a look at these projects:
-
https://github.com/spark-jobserver/spark-jobserver#persistent-context-mode---faster--required-for-related-jobs
- http://livy.incubator.apache.org
Problems with this
Hi all,
In spark 2.2.1, when I load parquet files, it shows differently ordered
result of original dataset.
It seems like FileSourceScanExec.createNonBucketedReadRDD method sorts
parquet file splits by their own lengths.
-
val splitFiles = selectedPartitions.flatMap { partition =>
Hello
I am trying to use CEP of Spark for log files (as batch job), but not for
streams (as realtime).
Is that possible ? If yes, do you know examples Scala codes about that ?
Or should I convert the log files (with time stamps) into streams ?
But how to handle time stamps in Spark ?
If I can