On 3 May 2016 at 17:22, Gourav Sengupta <gourav.sengu...@gmail.com> wrote:
> Hi, > > The best thing to do is start the EMR clusters with proper permissions in > the roles that way you do not need to worry about the keys at all. > > Another thing, why are we using s3a// instead of s3:// ? > Probably because of what's said about s3:// and s3n:// here (which is why I use s3a://): https://wiki.apache.org/hadoop/AmazonS3 Regards, James > Besides that you can increase s3 speeds using the instructions mentioned > here: > https://aws.amazon.com/blogs/aws/aws-storage-update-amazon-s3-transfer-acceleration-larger-snowballs-in-more-regions/ > > > Regards, > Gourav > > On Tue, May 3, 2016 at 12:04 PM, Steve Loughran <ste...@hortonworks.com> > wrote: > >> don't put your secret in the URI, it'll only creep out in the logs. >> >> Use the specific properties coverd in >> http://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html, >> which you can set in your spark context by prefixing them with spark.hadoop. >> >> you can also set the env vars, AWS_ACCESS_KEY_ID and >> AWS_SECRET_ACCESS_KEY; SparkEnv will pick these up and set the relevant >> spark context keys for you >> >> >> On 3 May 2016, at 01:53, Zhang, Jingyu <jingyu.zh...@news.com.au> wrote: >> >> Hi All, >> >> I am using Eclipse with Maven for developing Spark applications. I got a >> error for Reading from S3 in Scala but it works fine in Java when I run >> them in the same project in Eclipse. The Scala/Java code and the error in >> following >> >> >> Scala >> >> val uri = URI.create("s3a://" + key + ":" + seckey + "@" + >> "graphclustering/config.properties"); >> val pt = new Path("s3a://" + key + ":" + seckey + "@" + >> "graphclustering/config.properties"); >> val fs = FileSystem.get(uri,ctx.hadoopConfiguration); >> val inputStream:InputStream = fs.open(pt); >> >> ----Exception: on aws-java-1.7.4 and hadoop-aws-2.6.1---- >> >> Exception in thread "main" >> com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: >> Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: >> 8A56DC7BF0BFF09A), S3 Extended Request ID >> >> at com.amazonaws.http.AmazonHttpClient.handleErrorResponse( >> AmazonHttpClient.java:1160) >> >> at com.amazonaws.http.AmazonHttpClient.executeOneRequest( >> AmazonHttpClient.java:748) >> >> at com.amazonaws.http.AmazonHttpClient.executeHelper( >> AmazonHttpClient.java:467) >> >> at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:302) >> >> at com.amazonaws.services.s3.AmazonS3Client.invoke( >> AmazonS3Client.java:3785) >> >> at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata( >> AmazonS3Client.java:1050) >> >> at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata( >> AmazonS3Client.java:1027) >> >> at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus( >> S3AFileSystem.java:688) >> >> at org.apache.hadoop.fs.s3a.S3AFileSystem.open(S3AFileSystem.java:222) >> >> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:766) >> >> at com.news.report.graph.GraphCluster$.main(GraphCluster.scala:53) >> >> at com.news.report.graph.GraphCluster.main(GraphCluster.scala) >> >> 16/05/03 10:49:17 INFO SparkContext: Invoking stop() from shutdown hook >> >> 16/05/03 10:49:17 INFO SparkUI: Stopped Spark web UI at >> http://10.65.80.125:4040 >> >> 16/05/03 10:49:17 INFO MapOutputTrackerMasterEndpoint: >> MapOutputTrackerMasterEndpoint stopped! >> >> 16/05/03 10:49:17 INFO MemoryStore: MemoryStore cleared >> >> 16/05/03 10:49:17 INFO BlockManager: BlockManager stopped >> >> ----Exception: on aws-java-1.7.4 and hadoop-aws-2.7.2---- >> >> 16/05/03 10:23:40 INFO Slf4jLogger: Slf4jLogger started >> >> 16/05/03 10:23:40 INFO Remoting: Starting remoting >> >> 16/05/03 10:23:40 INFO Remoting: Remoting started; listening on addresses >> :[akka.tcp://sparkDriverActorSystem@10.65.80.125:61860] >> >> 16/05/03 10:23:40 INFO Utils: Successfully started service >> 'sparkDriverActorSystem' on port 61860. >> >> 16/05/03 10:23:40 INFO SparkEnv: Registering MapOutputTracker >> >> 16/05/03 10:23:40 INFO SparkEnv: Registering BlockManagerMaster >> >> 16/05/03 10:23:40 INFO DiskBlockManager: Created local directory at >> /private/var/folders/sc/tdmkbvr1705b8p70xqj1kqks5l9p >> >> 16/05/03 10:23:40 INFO MemoryStore: MemoryStore started with capacity >> 1140.4 MB >> >> 16/05/03 10:23:40 INFO SparkEnv: Registering OutputCommitCoordinator >> >> 16/05/03 10:23:40 INFO Utils: Successfully started service 'SparkUI' on >> port 4040. >> >> 16/05/03 10:23:40 INFO SparkUI: Started SparkUI at >> http://10.65.80.125:4040 >> >> 16/05/03 10:23:40 INFO Executor: Starting executor ID driver on host >> localhost >> >> 16/05/03 10:23:40 INFO Utils: Successfully started service >> 'org.apache.spark.network.netty.NettyBlockTransferService' on port 61861. >> >> 16/05/03 10:23:40 INFO NettyBlockTransferService: Server created on 61861 >> >> 16/05/03 10:23:40 INFO BlockManagerMaster: Trying to register BlockManager >> >> 16/05/03 10:23:40 INFO BlockManagerMasterEndpoint: Registering block >> manager localhost:61861 with 1140.4 MB RAM, BlockManagerId(driver, >> localhost, 61861) >> >> 16/05/03 10:23:40 INFO BlockManagerMaster: Registered BlockManager >> >> Exception in thread "main" java.lang.NoSuchMethodError: >> com.amazonaws.services.s3.transfer.TransferManagerConfiguration.setMultipartUploadThreshold(I)V >> >> at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize( >> S3AFileSystem.java:285) >> >> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596) >> >> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) >> >> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630 >> ) >> >> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) >> >> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) >> >> at com.news.report.graph.GraphCluster$.main(GraphCluster.scala:52) >> >> at com.news.report.graph.GraphCluster.main(GraphCluster.scala) >> >> 16/05/03 10:23:51 INFO SparkContext: Invoking stop() from shutdown hook >> >> 16/05/03 10:23:51 INFO SparkUI: Stopped Spark web UI at >> http://10.65.80.125:4040 >> >> 16/05/03 10:23:51 INFO MapOutputTrackerMasterEndpoint: >> MapOutputTrackerMasterEndpoint stopped! >> >> 16/05/03 10:23:51 INFO MemoryStore: MemoryStore cleared >> >> 16/05/03 10:23:51 INFO BlockManager: BlockManager stopped >> >> 16/05/03 10:23:51 INFO BlockManagerMaster: BlockManagerMaster stopped >> >> 16/05/03 10:23:51 INFO >> OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: >> OutputCommitCoordinator stopped! >> >> 16/05/03 10:23:51 INFO SparkContext: Successfully stopped SparkContext >> >> 16/05/03 10:23:51 INFO ShutdownHookManager: Shutdown hook called >> >> 16/05/03 10:23:51 INFO ShutdownHookManager: Deleting directory >> /private/var/folders/sc/tdmkbvr1705b8p70xqj1kqks5l9pk9/T/spark-53cf244a-2947-48c9-ba97-7302c9985f35 >> >> 16/05/03 10:49:17 INFO S3AFileSystem: Caught an AmazonServiceException, >> which means your request made it to Amazon S3, but was rejected with an >> error response for some reason. >> >> 16/05/03 10:49:17 INFO S3AFileSystem: Error Message: Forbidden (Service: >> Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: >> 8A56DC7BF0BFF09A) >> >> 16/05/03 10:49:17 INFO S3AFileSystem: HTTP Status Code: 403 >> >> 16/05/03 10:49:17 INFO S3AFileSystem: AWS Error Code: 403 Forbidden >> >> 16/05/03 10:49:17 INFO S3AFileSystem: Error Type: Client >> >> 16/05/03 10:49:17 INFO S3AFileSystem: Request ID: 8A56DC7BF0BFF09A >> >> 16/05/03 10:49:17 INFO S3AFileSystem: Class Name: >> com.amazonaws.services.s3.model.AmazonS3Exception >> >> >> But, Java code works without error >> >> URI uri = URI.create("s3a://" + key + ":" + seckey + "@" + >> "graphclustering/config.properties"); >> Path pt = new Path("s3a://" + key + ":" + seckey + "@" + >> "graphclustering/config.properties"); >> FileSystem fs = FileSystem.get(uri,ctx.hadoopConfiguration()); >> inputStream = fs.open(pt); >> >> Thanks, >> >> Jingyu >> >> This message and its attachments may contain legally privileged or >> confidential information. It is intended solely for the named addressee. If >> you are not the addressee indicated in this message or responsible for >> delivery of the message to the addressee, you may not copy or deliver this >> message or its attachments to anyone. Rather, you should permanently delete >> this message and its attachments and kindly notify the sender by reply >> e-mail. Any content of this message and its attachments which does not >> relate to the official business of the sending company must be taken not to >> have been sent or endorsed by that company or any of its related entities. >> No warranty is made that the e-mail or attachments are free from computer >> virus or other defect. >> >> >> >