Re: AWS EMR SPARK 3.1.1 date issues

2021-08-29 Thread Gourav Sengupta
Hi Nicolas, thanks a ton for your kind response, I will surely try this out. Regards, Gourav Sengupta On Sun, Aug 29, 2021 at 11:01 PM Nicolas Paris wrote: > as a workaround turn off pruning : > > spark.sql.hive.metastorePartitionPruning false > spark.sql.hive.convertMetastoreParquet false >

Re: AWS EMR SPARK 3.1.1 date issues

2021-08-29 Thread Nicolas Paris
as a workaround turn off pruning : spark.sql.hive.metastorePartitionPruning false spark.sql.hive.convertMetastoreParquet false see https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore/issues/45 On Tue Aug 24, 2021 at 9:18 AM CEST, Gourav Sengupta wrote: > Hi, > > I

Re: AWS EMR SPARK 3.1.1 date issues

2021-08-24 Thread Gourav Sengupta
Hi, I received a response from AWS, this is an issue with EMR, and they are working on resolving the issue I believe. Thanks and Regards, Gourav Sengupta On Mon, Aug 23, 2021 at 1:35 PM Gourav Sengupta < gourav.sengupta.develo...@gmail.com> wrote: > Hi, > > the query still gives the same error

Re: AWS EMR SPARK 3.1.1 date issues

2021-08-23 Thread Gourav Sengupta
Hi, the query still gives the same error if we write "SELECT * FROM table_name WHERE data_partition > CURRENT_DATE() - INTERVAL 10 DAYS". Also the queries work fine in SPARK 3.0.x, or in EMR 6.2.0. Thanks and Regards, Gourav Sengupta On Mon, Aug 23, 2021 at 1:16 PM Sean Owen wrote: > Date

Re: AWS EMR SPARK 3.1.1 date issues

2021-08-23 Thread Sean Owen
Date handling was tightened up in Spark 3. I think you need to compare to a date literal, not a string literal. On Mon, Aug 23, 2021 at 5:12 AM Gourav Sengupta < gourav.sengupta.develo...@gmail.com> wrote: > Hi, > > while I am running in EMR 6.3.0 (SPARK 3.1.1) a simple query as "SELECT * > FROM

Re: Aws

2019-02-08 Thread Pedro Tuero
Hi Noritaka, I start clusters from Java API. Clusters running on 5.16 have not manual configurations in the Emr console Configuration tab, so I assume the value of this property should be the default on 5.16. I enabled maximize resource allocation because otherwise, the number of cores

Re: Aws

2019-02-07 Thread Noritaka Sekiyama
Hi Pedro, It seems that you disabled maximize resource allocation in 5.16, but enabled in 5.20. This config can be different based on how you start EMR cluster (via quick wizard, advanced wizard in console, or CLI/API). You can see that in EMR console Configuration tab. Please compare spark

Re: Aws

2019-02-07 Thread Hiroyuki Nagata
Hi, thank you Pedro I tested maximizeResourceAllocation option. When it's enabled, it seems Spark utilized their cores fully. However the performance is not so different from default setting. I consider to use s3-distcp for uploading files. And, I think table(dataframe) caching is also

Re: Aws

2019-02-01 Thread Pedro Tuero
Hi Hiroyuki, thanks for the answer. I found a solution for the cores per executor configuration: I set this configuration to true: https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-configure.html#emr-spark-maximizeresourceallocation Probably it was true by default at version 5.16, but

Re: Aws

2019-01-31 Thread Hiroyuki Nagata
Hi, Pedro I also start using AWS EMR, with Spark 2.4.0. I'm seeking methods for performance tuning. Do you configure dynamic allocation ? FYI: https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation I've not tested it yet. I guess spark-submit needs to specify

Re: AWS credentials needed while trying to read a model from S3 in Spark

2018-05-09 Thread Srinath C
You could use IAM roles in AWS to access the data in S3 without credentials. See this link and this link for an

Re: AWS CLI --jars comma problem

2015-12-07 Thread Akhil Das
Not a direct answer but you can create a big fat jar combining all the classes in the three jars and pass it. Thanks Best Regards On Thu, Dec 3, 2015 at 10:21 PM, Yusuf Can Gürkan wrote: > Hello > > I have a question about AWS CLI for people who use it. > > I create a

Re: AWS-Credentials fails with org.apache.hadoop.fs.s3.S3Exception: FORBIDDEN

2015-05-08 Thread Akhil Das
Have a look at this SO http://stackoverflow.com/questions/24048729/how-to-read-input-from-s3-in-a-spark-streaming-ec2-cluster-application question, it has discussion on various ways of accessing S3. Thanks Best Regards On Fri, May 8, 2015 at 1:21 AM, in4maniac sa...@skimlinks.com wrote: Hi

Re: AWS-Credentials fails with org.apache.hadoop.fs.s3.S3Exception: FORBIDDEN

2015-05-08 Thread in4maniac
HI GUYS... I realised that it was a bug in my code that caused the code to break.. I was running the filter on a SchemaRDD when I was supposed to be running it on an RDD. But I still don't understand why the stderr was about S3 request rather than a type checking error such as No tuple position

Re: AWS SDK HttpClient version conflict (spark.files.userClassPathFirst not working)

2015-03-12 Thread 浦野 裕也
Hi Adam, Could you try building spark with profile -Pkinesis-asl. mvn -Pkinesis-asl -DskipTests clean package refers to 'Running the Example' section. https://spark.apache.org/docs/latest/streaming-kinesis-integration.html In fact, I've seen same issue and have been able to use the AWS SDK by

Re: AWS Credentials for private S3 reads

2014-07-02 Thread Matei Zaharia
When you use hadoopConfiguration directly, I don’t think you have to replace the “/“ with “%2f”. Have you tried it without that? Also make sure you’re not replacing slashes in the URL itself. Matei On Jul 2, 2014, at 4:17 PM, Brian Gawalt bgaw...@gmail.com wrote: Hello everyone, I'm

Re: AWS Spark-ec2 script with different user

2014-04-09 Thread Nicholas Chammas
Marco, If you call spark-ec2 launch without specifying an AMI, it will default to the Spark-provided AMI. Nick On Wed, Apr 9, 2014 at 9:43 AM, Marco Costantini silvio.costant...@granatads.com wrote: Hi there, To answer your question; no there is no reason NOT to use an AMI that Spark has

Re: AWS Spark-ec2 script with different user

2014-04-09 Thread Nicholas Chammas
And for the record, that AMI is ami-35b1885c. Again, you don't need to specify it explicitly; spark-ec2 will default to it. On Wed, Apr 9, 2014 at 11:08 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Marco, If you call spark-ec2 launch without specifying an AMI, it will default to

Re: AWS Spark-ec2 script with different user

2014-04-09 Thread Marco Costantini
Ah, tried that. I believe this is an HVM AMI? We are exploring paravirtual AMIs. On Wed, Apr 9, 2014 at 11:17 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: And for the record, that AMI is ami-35b1885c. Again, you don't need to specify it explicitly; spark-ec2 will default to it.

Re: AWS Spark-ec2 script with different user

2014-04-09 Thread Shivaram Venkataraman
The AMI should automatically switch between PVM and HVM based on the instance type you specify on the command line. For reference (note you don't need to specify this on the command line), the PVM ami id is ami-5bb18832 in us-east-1. FWIW we maintain the list of AMI Ids (across regions and pvm,

Re: AWS Spark-ec2 script with different user

2014-04-08 Thread Marco Costantini
Another thing I didn't mention. The AMI and user used: naturally I've created several of my own AMIs with the following characteristics. None of which worked. 1) Enabling ssh as root as per this guide ( http://blog.tiger-workshop.com/enable-root-access-on-amazon-ec2-instance/). When doing this, I

Re: AWS Spark-ec2 script with different user

2014-04-08 Thread Marco Costantini
I was able to keep the workaround ...around... by overwriting the generated '/root/.ssh/authorized_keys' file with a known good one, in the '/etc/rc.local' file On Tue, Apr 8, 2014 at 10:12 AM, Marco Costantini silvio.costant...@granatads.com wrote: Another thing I didn't mention. The AMI and

Re: AWS Spark-ec2 script with different user

2014-04-07 Thread Marco Costantini
Hi Shivaram, OK so let's assume the script CANNOT take a different user and that it must be 'root'. The typical workaround is as you said, allow the ssh with the root user. Now, don't laugh, but, this worked last Friday, but today (Monday) it no longer works. :D Why? ... ...It seems that NOW,

Re: AWS Spark-ec2 script with different user

2014-04-07 Thread Shivaram Venkataraman
Hmm -- That is strange. Can you paste the command you are using to launch the instances ? The typical workflow is to use the spark-ec2 wrapper script using the guidelines at http://spark.apache.org/docs/latest/ec2-scripts.html Shivaram On Mon, Apr 7, 2014 at 1:53 PM, Marco Costantini