Re: Triggering sql on Was S3 via Apache Spark

2018-10-23 Thread Jörn Franke
Why not directly access the S3 file from Spark? You need to configure the IAM roles so that the machine running the S3 code is allowed to access the bucket. > Am 24.10.2018 um 06:40 schrieb Divya Gehlot : > > Hi Omer , > Here are couple of the solutions which you can implement for your use

Re: Triggering sql on Was S3 via Apache Spark

2018-10-23 Thread Divya Gehlot
Hi Omer , Here are couple of the solutions which you can implement for your use case : *Option 1 : * you can mount the S3 bucket as local file system Here are the details : https://cloud.netapp.com/blog/amazon-s3-as-a-file-system *Option 2 :* You can use Amazon Glue for your use case here are the

Re: ALS block settings

2018-10-23 Thread evanzamir
I have the same question. Trying to figure out how to get ALS to complete with larger dataset. It seems to get stuck on "Count" from what I can tell. I'm running 8 r4.4xlarge instances on Amazon EMR. The dataset is 80 GB (just to give some idea of size). I assumed Spark could handle this, but

Re: [Spark UI] Spark 2.3.1 UI no longer respects spark.ui.retainedJobs

2018-10-23 Thread Patrick Brown
I believe I may be able to reproduce this now, it seems like it may be something to do with many jobs at once: Spark 2.3.1 > spark-shell --conf spark.ui.retainedJobs=1 scala> import scala.concurrent._ scala> import scala.concurrent.ExecutionContext.Implicits.global scala> for (i <- 0 until

Triggering sql on Was S3 via Apache Spark

2018-10-23 Thread Omer.Ozsakarya
Hi guys, We are using Apache Spark on a local machine. I need to implement the scenario below. In the initial load: 1. CRM application will send a file to a folder. This file contains customer information of all customers. This file is in a folder in the local server. File name is: