Re: Long running application failed to init containers due to anthentication errors

2019-03-23 Thread Billy Watson
So just a hunch because we’ve been dealing with something similar. When the
failure occurs, has the resource manager also failed over just recently or
in the previous 24 hours?

One thing to try: catch this exception and manually fail to the new
master/resource manager.

- Billy Watson

On Thu, Nov 29, 2018 at 21:16 Paul Lam  wrote:

> Hi,
>
> I’m running Flink applications on YARN 2.6.0-cdh5.6.0 and get a situation.
> After running for a while (could be longer than 7 days) the application
> might
> need to rescale up or recover from a node failure but it is not able to
> allocate new containers. All the incoming containers would fail to localize
> resources
> and create log aggregation dirs for lack of credentials, so the Flink
> application never gets the requested containers. It seems that the
> credentials in the
> container launch context somehow disappears.
>
> I find this looks very similar to FLINK-6376[1] and YARN-2704[2], but both
> of them should have been fixed. The Flink AM gets the hdfs delegation token
> from
>  the client, put it into the container launch context and will not refresh
> it afterwards. But IMHO, if the token is expired, the exception should be
> “token expired”
> or “token not found in cache”, but now what I get is “client cannot
> authenticate via [token, kerberos]”.
>
> This happens very randomly, and I have been struggling with it for couples
> of days. Any help would be greatly appreciated. Thanks a lot!
>
> [1] https://issues.apache.org/jira/browse/FLINK-6376
> [2] https://issues.apache.org/jira/browse/YARN-2704
>
> Best,
> Paul Lam
>
>
> --
William Watson


Re: Regarding containers not launching

2018-01-31 Thread Billy Watson
Also, is there anything interesting in the yarn scheduler logs? Something
about scheduling being skipped?




On Wed, Jan 31, 2018 at 05:16 Billy Watson <williamrwat...@gmail.com> wrote:

> Ok, and your container settings?
>
> On Wed, Jan 31, 2018 at 02:38 nishchay malhotra <
> nishchay.malht...@gmail.com> wrote:
>
>> yes my job has about 160,000 maps and my cluster not getting fully
>> utilized around 6000 maps ran for 2 hrs and then I killed the job. At any
>> point of time only 40 containers are running thats just 11% of my cluster
>> capacity.
>>
>> {
>> "classification": "mapred-site",
>> "properties": {
>>   "mapreduce.job.reduce.slowstart.completedmaps":"1",
>>   "mapreduce.reduce.memory.mb": "3072",
>>   "mapreduce.map.memory.mb": "2208",
>>   "mapreduce.map.java.opts":"-Xmx1800m",
>>   "mapreduce.map.cpu.vcores":"1"
>> }
>>   },
>>   {
>>   "classification": "yarn-site",
>>   "properties": {
>> "yarn.scheduler.minimum-allocation-mb": "32”,
>> “yarn.scheduler.maximum-allocation-mb”:”253952”,
>> “yarn.scheduler.maximum-allocation-vcores: “128”
>>
>> "yarn.nodemanager.vmem-pmem-ratio":"3",
>> "yarn.nodemanager.vmem-check-enabled":"true",
>>  yarn.nodemanager.resource.cpu-vcores" ; "16”,
>>  yarn.nodemanager.resource.memory-mb: “23040"
>>   }
>>
>> Each node: capacity
>> Disk-space=100gb
>> memory=28gb
>> processors: 8
>>
>>
>> --
> William Watson
>
> --
William Watson


Re: Regarding containers not launching

2018-01-31 Thread Billy Watson
Ok, and your container settings?

On Wed, Jan 31, 2018 at 02:38 nishchay malhotra 
wrote:

> yes my job has about 160,000 maps and my cluster not getting fully
> utilized around 6000 maps ran for 2 hrs and then I killed the job. At any
> point of time only 40 containers are running thats just 11% of my cluster
> capacity.
>
> {
> "classification": "mapred-site",
> "properties": {
>   "mapreduce.job.reduce.slowstart.completedmaps":"1",
>   "mapreduce.reduce.memory.mb": "3072",
>   "mapreduce.map.memory.mb": "2208",
>   "mapreduce.map.java.opts":"-Xmx1800m",
>   "mapreduce.map.cpu.vcores":"1"
> }
>   },
>   {
>   "classification": "yarn-site",
>   "properties": {
> "yarn.scheduler.minimum-allocation-mb": "32”,
> “yarn.scheduler.maximum-allocation-mb”:”253952”,
> “yarn.scheduler.maximum-allocation-vcores: “128”
>
> "yarn.nodemanager.vmem-pmem-ratio":"3",
> "yarn.nodemanager.vmem-check-enabled":"true",
>  yarn.nodemanager.resource.cpu-vcores" ; "16”,
>  yarn.nodemanager.resource.memory-mb: “23040"
>   }
>
> Each node: capacity
> Disk-space=100gb
> memory=28gb
> processors: 8
>
>
> --
William Watson


Re: Regarding containers not launching

2018-01-30 Thread Billy Watson
Is your job able to use more containers, I.e. does your job have tasks
waiting or are all tasks in progress?

William Watson


On Tue, Jan 30, 2018 at 1:56 AM, nishchay malhotra <
nishchay.malht...@gmail.com> wrote:

> What should I be looking for if my 24-node cluster in not launching enough
> containers?
> only 40/288 cores are used and 87GB/700GB is memory is used.
> Yarn.nodemanager memory/core conf look good. And so do container
> memory/core conf.
>
> Thanks
> Nishchay Malhotra
>


Re: Install a hadoop cluster manager for open source hadoop 2.7.3

2017-07-27 Thread Billy Watson
Nishant,

Sorry about the late reply. You may want to check out
https://ambari.apache.org/mail-lists.html to see if the Ambari user list
can answer your question better.

William Watson
Lead Software Engineer
J.D. Power O2O
http://www.jdpower.com/data-and-analytics/media-and-marketing-solutions-o2o

On Wed, Jul 19, 2017 at 8:31 AM, Nishant Verma 
wrote:

> Hello,
>
> I have a 5 node hadoop cluster installed on AWS EC2 instances. I have
> installed open source hadoop 2.7.3 version.
>
> I am now trying to setup a cluster manager for this. I tried setting up
> Ambari. I followed the steps present in https://cwiki.apache.org/
> confluence/display/AMBARI/Installation+Guide+for+Ambari+2.5.1 for
> building Ambari from source. But I am stuck with multiple build failures at
> ambari-agenet and ambari-metrics step.
>
> Dont we have binaries available for Ambari to be used on top of open
> source hadoop clusters?
>
> I followed the Ambari repository download options from
> https://docs.hortonworks.com/HDPDocuments/Ambari-2.2.0.
> 0/bk_ambari_reference_guide/content/_download_the_ambari_repo.html and
> was able to start postgres and ambari-server. And getting errors while
> starting ambari-agents on my namenode and datanodes. I am afraid if this
> was the right approach to download the repos from this link for my case.
>
> Any insight or help would benefit me.
>
> Regards
> Nishant
>


Re: fs.s3a.endpoint not working

2016-02-17 Thread Billy Watson
Thanks for following up!

William Watson
Lead Software Engineer

On Tue, Feb 16, 2016 at 5:08 PM, Phillips, Caleb <caleb.phill...@nrel.gov>
wrote:

> Hi All,
>
> Just wanted to follow on that we got this working with the help of the
> object storage vendor. After running in circles for a bit, the issue seems
> to have been as simple as using the correct FQDN in the endpoint fields and
> disabling SSL. We used the jet3st properties, but it turns out those aren’t
> actually needed with recent Hadoop versions (?).
>
> For anyone who might be having similar issues, here are the relevant
> configuration in core-site.xml for S3A and S3N with Hadoop 2.7.1:
>
> 
>
> 
> 
>  fs.s3n.awsAccessKeyId
>  AWS access key ID
>  yourusername
> 
>
> 
>  fs.s3n.awsSecretAccessKey
>  AWS secret key
>  sweetpassword
> 
>
> 
>  fs.s3n.endpoint
>  youre.fqdn.here
> 
>
> 
>  fs.s3n.ssl.enabled
>  false
> 
>
> 
>
> 
>  fs.s3a.access.key
>  AWS access key ID. Omit for Role-based
> authentication.
>  yourusername
> 
>
> 
>  fs.s3a.secret.key
>  AWS secret key. Omit for Role-based
> authentication.
>  sweetpassword
> 
>
> 
>  fs.s3a.connection.ssl.enabled
>  false
>  Enables or disables SSL connections to S3.
> 
>
> 
>  fs.s3a.endpoint
>  AWS S3 endpoint to connect to. An up-to-date list is
> provided in the AWS Documentation: regions and endpoints. Without this
> property, the standard region (s3.amazonaws.com) is assumed.
>  
>  your.fqdn.here
> 
>
> 
>
> Also, as mentioned previously in the thread, it’s necessary to add some
> things to your HADOOP_CLASSPATH:
>
> export
> HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/path/to/hadoop-2.7.1/share/hadoop/tools/lib/*
>
> You can test by:
>
> s3cmd mb s3://some-bucket   # <- note that you have to do this
> with s3cmd, not hadoop, at least with our object store
> hadoop fs -ls s3n://some-bucket/
> hadoop fs -ls s3a://some-bucket/
> hadoop distcp /your/favorite/hdfs/data s3a://some-bucket/
>
> HTH,
>
> --
> Caleb Phillips, Ph.D.
> Data Scientist | Computational Science Center
>
> National Renewable Energy Laboratory (NREL)
> 15013 Denver West Parkway | Golden, CO 80401
> 303-275-4297 | caleb.phill...@nrel.gov
>
> From: Billy Watson <williamrwat...@gmail.com williamrwat...@gmail.com>>
> Date: Tuesday, January 19, 2016 at 8:41 AM
> To: Alexander Pivovarov <apivova...@gmail.com<mailto:apivova...@gmail.com
> >>
> Cc: Caleb Phillips <caleb.phill...@nrel.gov<mailto:caleb.phill...@nrel.gov>>,
> "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" <
> user@hadoop.apache.org<mailto:user@hadoop.apache.org>>
> Subject: Re: fs.s3a.endpoint not working
>
> Stupid question, I assume you're using a URL that starts with s3a and that
> your custom endpoint supports s3a?
>
> William Watson
> Lead Software Engineer
>
> On Thu, Jan 14, 2016 at 1:57 PM, Alexander Pivovarov <apivova...@gmail.com
> <mailto:apivova...@gmail.com>> wrote:
>
> http://www.jets3t.org/toolkit/configuration.html
>
> On Jan 14, 2016 10:56 AM, "Alexander Pivovarov" <apivova...@gmail.com
> <mailto:apivova...@gmail.com>> wrote:
>
> Add jets3t.properties file with s3service.s3-endpoint= to
> /etc/hadoop/conf folder
>
> The folder with the file should be in HADOOP_CLASSPATH
>
> JetS3t library which is used by hadoop is looking for this file.
>
> On Dec 22, 2015 12:39 PM, "Phillips, Caleb" <caleb.phill...@nrel.gov
> <mailto:caleb.phill...@nrel.gov>> wrote:
> Hi All,
>
> New to this list. Looking for a bit of help:
>
> I'm having trouble connecting Hadoop to a S3-compatable (non AWS) object
> store.
>
> This issue was discussed, but left unresolved, in this thread:
>
>
> https://mail-archives.apache.org/mod_mbox/spark-user/201507.mbox/%3cca+0w_au5es_flugzmgwkkga3jya1asi3u+isjcuymfntvnk...@mail.gmail.com%3E
>
> And here, on Cloudera's forums (the second post is mine):
>
>
> https://community.cloudera.com/t5/Data-Ingestion-Integration/fs-s3a-endpoint-ignored-in-hdfs-site-xml/m-p/33694#M1180
>
> I'm running Hadoop 2.6.3 with Java 1.8 (65) on a Linux host. Using Hadoop,
> I'm able to connect to S3 on AWS, and e.g., list/put/get files.
>
> However, when I point the fs.s3a.endpoint configuration directive at my
> non-AWS S3-Compatable object storage, it appears to still point at (and
> authenticate against) AWS.
>
> I've checked and double-checked my credentials and configuration using
> both Python's boto library and the s3cmd tool, both of which connect to
> this non-AWS data store just fine.
>
> Any help would be much appreciated. Thanks!
>
> --
> Caleb Phillips, Ph.D.
> Data Scientist | Computational Science Center
>
> National Renewable Energy Laboratory (NREL)
> 15013 Denver West Parkway | Golden, CO 80401
> 303-275-4297 | caleb.phill...@nrel.gov caleb.phill...@nrel.gov>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org user-unsubscr...@hadoop.apache.org>
> For additional commands, e-mail: user-h...@hadoop.apache.org user-h...@hadoop.apache.org>
>
>
>


Re: fs.s3a.endpoint not working

2016-01-19 Thread Billy Watson
Stupid question, I assume you're using a URL that starts with s3a and that
your custom endpoint supports s3a?

William Watson
Lead Software Engineer

On Thu, Jan 14, 2016 at 1:57 PM, Alexander Pivovarov 
wrote:

> http://www.jets3t.org/toolkit/configuration.html
> On Jan 14, 2016 10:56 AM, "Alexander Pivovarov" 
> wrote:
>
>> Add jets3t.properties file with s3service.s3-endpoint= to
>> /etc/hadoop/conf folder
>>
>> The folder with the file should be in HADOOP_CLASSPATH
>>
>> JetS3t library which is used by hadoop is looking for this file.
>> On Dec 22, 2015 12:39 PM, "Phillips, Caleb" 
>> wrote:
>>
>>> Hi All,
>>>
>>> New to this list. Looking for a bit of help:
>>>
>>> I'm having trouble connecting Hadoop to a S3-compatable (non AWS) object
>>> store.
>>>
>>> This issue was discussed, but left unresolved, in this thread:
>>>
>>>
>>> https://mail-archives.apache.org/mod_mbox/spark-user/201507.mbox/%3cca+0w_au5es_flugzmgwkkga3jya1asi3u+isjcuymfntvnk...@mail.gmail.com%3E
>>>
>>> And here, on Cloudera's forums (the second post is mine):
>>>
>>>
>>> https://community.cloudera.com/t5/Data-Ingestion-Integration/fs-s3a-endpoint-ignored-in-hdfs-site-xml/m-p/33694#M1180
>>>
>>> I'm running Hadoop 2.6.3 with Java 1.8 (65) on a Linux host. Using
>>> Hadoop, I'm able to connect to S3 on AWS, and e.g., list/put/get files.
>>>
>>> However, when I point the fs.s3a.endpoint configuration directive at my
>>> non-AWS S3-Compatable object storage, it appears to still point at (and
>>> authenticate against) AWS.
>>>
>>> I've checked and double-checked my credentials and configuration using
>>> both Python's boto library and the s3cmd tool, both of which connect to
>>> this non-AWS data store just fine.
>>>
>>> Any help would be much appreciated. Thanks!
>>>
>>> --
>>> Caleb Phillips, Ph.D.
>>> Data Scientist | Computational Science Center
>>>
>>> National Renewable Energy Laboratory (NREL)
>>> 15013 Denver West Parkway | Golden, CO 80401
>>> 303-275-4297 | caleb.phill...@nrel.gov
>>>
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
>>> For additional commands, e-mail: user-h...@hadoop.apache.org
>>>
>>>


Re: fs.s3a.endpoint not working

2016-01-11 Thread Billy Watson
One of the threads suggested using the core-site.xml. Did you try putting
your configuration in there?

One thing I've noticed is that the AWS stuff is handled by an underlying
library (I think jets3t in < 2.6 versions, forget what in 2.6+) and when I
was trying to mess with stuff and spelunking through the hadoop code, I
kept running into blocks with that library.

William Watson
Software Engineer
(904) 705-7056 PCS

On Mon, Jan 11, 2016 at 10:39 AM, Phillips, Caleb 
wrote:

> Hi All,
>
> Just wanted to send this out again since there was no response
> (admittedly, originally sent in the midst of the US holiday season) and it
> seems to be an issue that continues to come up (see e.g., the email from
> Han Ju on Jan 5).
>
> If anyone has successfully connected Hadoop to a non-AWS S3-compatable
> object store, it’d be very helpful to hear how you made it work. The
> fs.s3a.endpoint configuration directive appears non-functional at our site
> (with Hadoop 2.6.3).
>
> --
> Caleb Phillips, Ph.D.
> Data Scientist | Computational Science Center
>
> National Renewable Energy Laboratory (NREL)
> 15013 Denver West Parkway | Golden, CO 80401
> 303-275-4297 | caleb.phill...@nrel.gov
>
>
>
>
>
>
> On 12/22/15, 1:39 PM, "Phillips, Caleb"  wrote:
>
> >Hi All,
> >
> >New to this list. Looking for a bit of help:
> >
> >I'm having trouble connecting Hadoop to a S3-compatable (non AWS) object
> >store.
> >
> >This issue was discussed, but left unresolved, in this thread:
> >
> >
> https://mail-archives.apache.org/mod_mbox/spark-user/201507.mbox/%3CCA+0W_
> >au5es_flugzmgwkkga3jya1asi3u+isjcuymfntvnk...@mail.gmail.com%3E
> >
> >And here, on Cloudera's forums (the second post is mine):
> >
> >
> https://community.cloudera.com/t5/Data-Ingestion-Integration/fs-s3a-endpoi
> >nt-ignored-in-hdfs-site-xml/m-p/33694#M1180
> >
> >I'm running Hadoop 2.6.3 with Java 1.8 (65) on a Linux host. Using
> >Hadoop, I'm able to connect to S3 on AWS, and e.g., list/put/get files.
> >
> >However, when I point the fs.s3a.endpoint configuration directive at my
> >non-AWS S3-Compatable object storage, it appears to still point at (and
> >authenticate against) AWS.
> >
> >I've checked and double-checked my credentials and configuration using
> >both Python's boto library and the s3cmd tool, both of which connect to
> >this non-AWS data store just fine.
> >
> >Any help would be much appreciated. Thanks!
> >
> >--
> >Caleb Phillips, Ph.D.
> >Data Scientist | Computational Science Center
> >
> >National Renewable Energy Laboratory (NREL)
> >15013 Denver West Parkway | Golden, CO 80401
> >303-275-4297 | caleb.phill...@nrel.gov
> >
> >-
> >To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
> >For additional commands, e-mail: user-h...@hadoop.apache.org
> >
>
>


Re: can't submit remote job

2015-05-18 Thread Billy Watson
Netflix Genie is what we use for submitting jobs.

William Watson
Software Engineer
(904) 705-7056 PCS

On Mon, May 18, 2015 at 7:07 AM, xeonmailinglist-gmail 
xeonmailingl...@gmail.com wrote:

  I also can't find a good site/book that explains well how to submit
 remote jobs. Also, can anyone know where can I get more useful info?



  Forwarded Message   Subject: can't submit remote job  Date:
 Mon, 18 May 2015 11:54:56 +0100  From: xeonmailinglist-gmail
 xeonmailingl...@gmail.com xeonmailingl...@gmail.com  To:
 user@hadoop.apache.org user@hadoop.apache.org user@hadoop.apache.org

  Hi,

 I am trying to submit a remote job in Yarn MapReduce, but I can’t because
 I get the error [1]. I don’t have more exceptions in the other logs.

 My Mapreduce runtime have 1 *ResourceManager* and 3 *NodeManagers*, and
 the HDFS is running properly (all nodes are alive).

 I have looked to all logs, and I still don’t understand why I get this
 error. Any help to fix this? Is it a problem of the remote job that I am
 submitting?

 [1]

 $ less logs/hadoop-ubuntu-namenode-ip-172-31-17-45.log

 2015-05-18 10:42:16,570 DEBUG org.apache.hadoop.hdfs.StateChange: *BLOCK* 
 NameNode.addBlock: file 
 /tmp/hadoop-yarn/staging/xeon/.staging/job_1431945660897_0001/job.split
 fileId=16394 for DFSClient_NONMAPREDUCE_-1923902075_14
 2015-05-18 10:42:16,570 DEBUG org.apache.hadoop.hdfs.StateChange: BLOCK* 
 NameSystem.getAdditionalBlock: 
 /tmp/hadoop-yarn/staging/xeon/.staging/job_1431945660897_0001/job.
 split inodeId 16394 for DFSClient_NONMAPREDUCE_-1923902075_14
 2015-05-18 10:42:16,571 DEBUG 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
 choose remote rack (location = ~/default-rack), fallback to lo
 cal rack
 org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy$NotEnoughReplicasException:
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:691)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRemoteRack(BlockPlacementPolicyDefault.java:580)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:348)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:214)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:111)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:126)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1545)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3200)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:641)

 ​

 --
 --






Re: Jr. to Mid Level Big Data jobs in Bay Area

2015-05-17 Thread Billy Watson
Uh, it's not about being tolerant. It's the wrong forum for it. There's
enough chatter on here and there are 50 job boards all over the
internetwebs. Please use the proper forums.

William Watson
Software Engineer
(904) 705-7056 PCS

On Sun, May 17, 2015 at 9:14 PM, Juan Suero juan.su...@gmail.com wrote:

 Hes a human asking for human advice.. its ok methinks.
 we should live in a more tolerant world.
 Thanks.

 On Sun, May 17, 2015 at 8:10 PM, Stephen Boesch java...@gmail.com wrote:

 Hi,  This is not a job board. Thanks.

 2015-05-17 16:00 GMT-07:00 Adam Pritchard apritchard...@gmail.com:

 Hi everyone,

 I was wondering if any of you know any openings looking to hire a big
 data dev in the Palo Alto area.

 Main thing I am looking for is to be on a team that will embrace having
 a Jr to Mid level big data developer, where I can grow my skill set and
 contribute.


 My skills are:

 3 years Java
 1.5 years Hadoop
 1.5 years Hbase
 1 year map reduce
 1 year Apache Storm
 1 year Apache Spark (did a Spark Streaming project in Scala)

 5 years PHP
 3 years iOS development
 4 years Amazon ec2 experience


 Currently I am working in San Francisco as a big data developer, but the
 team I'm on is content leaving me work that I already knew how to do when I
 came to the team (web services) and I want to work with big data
 technologies at least 70% of the time.


 I am not a senior big data dev, but I am motivated to be and am just
 looking for an opportunity where I can work all day or most of the day with
 big data technologies, and contribute and learn from the project at hand.


 Thanks if anyone can share any information,


 Adam







Re: Unable to Find S3N Filesystem Hadoop 2.6

2015-04-22 Thread Billy Watson
Chris and Sato,

Thanks a bunch! I've been so swamped by these and other issues we've been
having in scrambling to upgrade our cluster that I forgot to file a bug. I
certainly complained aloud that the docs were insufficient, but I didn't do
anything to help the community so thanks a bunch for recognizing that and
helping me out!

William Watson
Software Engineer
(904) 705-7056 PCS

On Wed, Apr 22, 2015 at 3:06 AM, Takenori Sato ts...@cloudian.com wrote:

 Hi Billy, Chris,

 Let me share a couple of my findings.

 I believe this was introduced by HADOOP-10893,
 which was introduced from 2.6.0(HDP2.2).

 1. fs.s3n.impl

  We added a property to the core-site.xml file:

 You don't need to explicitly set this. It has never been done so in
 previous versions.

 Take a look at FileSystem#loadFileSystem, which is called from
 FileSystem#getFileSystemClass.
 Subclasses of FileSystem are loaded automatically if they are available on
 a classloader you care.

 So you just need to make sure hadoop-aws.jar is on a classpath.

 For file system shell, this is done in hadoop-env.sh,
 while for a MR job, in mapreduce.application.classpath,
 or for YARN, in yarn.application.classpath.

 2. mapreduce.application.classpath

  And updated the classpath for mapreduce applications:

 Note that it points to a distributed cache on the default HDP 2.2
 distribution.

 property
 namemapreduce.application.classpath/name

 value$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure/value
 /property
 * $PWD/mr-framework/hadoop/share/hadoop/tools/lib/* contains
 hadoop-aws.jar(S3NFileSystem)

 While on a vanilla hadoop, it looks like standard paths as yours.

 property
 namemapreduce.application.classpath/name

 value/hadoop-2.6.0/etc/hadoop:/hadoop-2.6.0/share/hadoop/common/lib/*:/hadoop-2.6.0/share/hadoop/common/*:/hadoop-2.6.0/share/hadoop/hdfs:/hadoop-2.6.0/share/hadoop/hdfs/lib/*:/hadoop-2.6.0/share/hadoop/hdfs/*:/hadoop-2.6.0/share/hadoop/yarn/lib/*:/hadoop-2.6.0/share/hadoop/yarn/*:/hadoop-2.6.0/share/hadoop/mapreduce/lib/*:/hadoop-2.6.0/share/hadoop/mapreduce/*:/hadoop-2.6.0/contrib/capacity-scheduler/*.jar:/hadoop-2.6.0/share/hadoop/tools/lib/*/value
 /property

 Thanks,
 Sato

 On Wed, Apr 22, 2015 at 3:10 PM, Chris Nauroth cnaur...@hortonworks.com
 wrote:

  Hello Billy,

  I think your experience indicates that our documentation is
 insufficient for discussing how to configure and use the alternative file
 systems.  I filed issue HADOOP-11863 to track a documentation enhancement.

  https://issues.apache.org/jira/browse/HADOOP-11863

  Please feel free to watch that issue if you'd like to be informed as it
 makes progress.  Thank you for reporting back to the thread after you had a
 solution.

   Chris Nauroth
 Hortonworks
 http://hortonworks.com/


   From: Billy Watson williamrwat...@gmail.com
 Reply-To: user@hadoop.apache.org user@hadoop.apache.org
 Date: Monday, April 20, 2015 at 11:14 AM
 To: user@hadoop.apache.org user@hadoop.apache.org
 Subject: Re: Unable to Find S3N Filesystem Hadoop 2.6

   We found the correct configs.

  This post was helpful, but didn't entirely work for us out of the box
 since we are using hadoop-pseudo-distributed.
 http://hortonworks.com/community/forums/topic/s3n-error-for-hdp-2-2/

  We added a property to the core-site.xml file:

property
 namefs.s3n.impl/name
 valueorg.apache.hadoop.fs.s3native.NativeS3FileSystem/value
 descriptionTell hadoop which class to use to access s3 URLs. This
 change became necessary in hadoop 2.6.0/description
   /property

  And updated the classpath for mapreduce applications:

property
 namemapreduce.application.classpath/name

 value$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/tools/lib/*/value
 descriptionThe classpath specifically for mapreduce jobs. This
 override is nec. so that s3n URLs work on hadoop 2.6.0+/description
   /property

   William Watson
 Software Engineer
 (904) 705-7056 PCS

 On Mon, Apr 20, 2015 at 11:13 AM, Billy Watson williamrwat...@gmail.com
 wrote:

 Thanks, anyways. Anyone else run into this issue?

   William Watson
 Software Engineer
 (904) 705-7056 PCS

   On Mon, Apr 20, 2015 at 11:11 AM, Jonathan Aquilina 
 jaquil...@eagleeyet.net wrote:

  Sadly I'll have to pull back I have only run a Hadoop map reduce
 cluster with Amazon met

 Sent from my iPhone

 On 20 Apr 2015, at 16:53, Billy Watson williamrwat

Unable to Find S3N Filesystem Hadoop 2.6

2015-04-20 Thread Billy Watson
Hi,

I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command line
without issue. I have set some options in hadoop-env.sh to make sure all
the S3 stuff for hadoop 2.6 is set up correctly. (This was very confusing,
BTW and not enough searchable documentation on changes to the s3 stuff in
hadoop 2.6 IMHO).

Anyways, when I run a pig job which accesses s3, it gets to 16%, does not
fail in pig, but rather fails in mapreduce with Error:
java.io.IOException: No FileSystem for scheme: s3n.”

I have added [hadoop-install-loc]/lib and
[hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env
variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail
at 0% (before it ever gets to mapreduce) with a very similar “No fileystem
for scheme s3n” error.

I feel like at this point I just have to add the share/hadoop/tools/lib
directory (and maybe lib) to the right environment variable, but I can’t
figure out which environment variable that should be.

I appreciate any help, thanks!!


Stack trace:
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584) at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) at
org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at
org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at
org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at
org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498)
at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467)
at
org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103)
at
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.init(MapTask.java:512)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:415) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)


— Billy Watson

-- 
William Watson
Software Engineer
(904) 705-7056 PCS


Re: Unable to Find S3N Filesystem Hadoop 2.6

2015-04-20 Thread Billy Watson
I appreciate the response. These JAR files aren't 3rd party. They're
included with the Hadoop distribution, but in Hadoop 2.6 they stopped being
loaded by default and now they have to be loaded manually, if needed.

Essentially the problem boils down to:

- need to access s3n URLs
- cannot access without including the tools directory
- after including tools directory in HADOOP_CLASSPATH, failures start
happening later in job
- need to find right env variable (or shell script or w/e) to include
jets3t  other JARs needed to access s3n URLs (I think)



William Watson
Software Engineer
(904) 705-7056 PCS

On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina jaquil...@eagleeyet.net
wrote:

  you mention an environmental variable. the step before you specify the
 steps to run to get to the result. you can specify a bash script that will
 allow you to put any 3rd party jar files, for us we used esri, on the
 cluster and propagate them to all nodes in the cluster as well. You can
 ping me off list if you need further help. Thing is I havent used pig but
 my boss and coworker wrote the mappers and reducers. to get these jars to
 the entire cluster was a super small and simple bash script.



 ---
 Regards,
 Jonathan Aquilina
 Founder Eagle Eye T

  On 2015-04-20 15:17, Billy Watson wrote:

 Hi,

 I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command
 line without issue. I have set some options in hadoop-env.sh to make sure
 all the S3 stuff for hadoop 2.6 is set up correctly. (This was very
 confusing, BTW and not enough searchable documentation on changes to the s3
 stuff in hadoop 2.6 IMHO).

 Anyways, when I run a pig job which accesses s3, it gets to 16%, does not
 fail in pig, but rather fails in mapreduce with Error:
 java.io.IOException: No FileSystem for scheme: s3n.

 I have added [hadoop-install-loc]/lib and
 [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env
 variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail
 at 0% (before it ever gets to mapreduce) with a very similar No fileystem
 for scheme s3n error.

 I feel like at this point I just have to add the share/hadoop/tools/lib
 directory (and maybe lib) to the right environment variable, but I can't
 figure out which environment variable that should be.

 I appreciate any help, thanks!!


 Stack trace:
 org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
 at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
 at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at
 org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at
 org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at
 org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at
 org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498)
 at
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467)
 at
 org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103)
 at
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.init(MapTask.java:512)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at
 org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
 org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at
 java.security.AccessController.doPrivileged(Native Method) at
 javax.security.auth.Subject.doAs(Subject.java:415) at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)


 — Billy Watson

 --
  William Watson
 Software Engineer
 (904) 705-7056 PCS




Re: Unable to Find S3N Filesystem Hadoop 2.6

2015-04-20 Thread Billy Watson
This is an install on a CentOS 6 virtual machine used in our test
environment. We use HDP in staging and production and we discovered these
issues while trying to build a new cluster using HDP 2.2 which upgrades
from Hadoop 2.4 to Hadoop 2.6.

William Watson
Software Engineer
(904) 705-7056 PCS

On Mon, Apr 20, 2015 at 10:26 AM, Jonathan Aquilina jaquil...@eagleeyet.net
 wrote:

  One thing I think which i most likely missed completely is are you using
 an amazon EMR cluster or something in house?



 ---
 Regards,
 Jonathan Aquilina
 Founder Eagle Eye T

  On 2015-04-20 16:21, Billy Watson wrote:

 I appreciate the response. These JAR files aren't 3rd party. They're
 included with the Hadoop distribution, but in Hadoop 2.6 they stopped being
 loaded by default and now they have to be loaded manually, if needed.

 Essentially the problem boils down to:

 - need to access s3n URLs
 - cannot access without including the tools directory
 - after including tools directory in HADOOP_CLASSPATH, failures start
 happening later in job
 - need to find right env variable (or shell script or w/e) to include
 jets3t  other JARs needed to access s3n URLs (I think)



   William Watson
 Software Engineer
 (904) 705-7056 PCS

 On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina 
 jaquil...@eagleeyet.net wrote:

  you mention an environmental variable. the step before you specify the
 steps to run to get to the result. you can specify a bash script that will
 allow you to put any 3rd party jar files, for us we used esri, on the
 cluster and propagate them to all nodes in the cluster as well. You can
 ping me off list if you need further help. Thing is I havent used pig but
 my boss and coworker wrote the mappers and reducers. to get these jars to
 the entire cluster was a super small and simple bash script.



 ---
 Regards,
 Jonathan Aquilina
 Founder Eagle Eye T

   On 2015-04-20 15:17, Billy Watson wrote:

 Hi,

 I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command
 line without issue. I have set some options in hadoop-env.sh to make sure
 all the S3 stuff for hadoop 2.6 is set up correctly. (This was very
 confusing, BTW and not enough searchable documentation on changes to the s3
 stuff in hadoop 2.6 IMHO).

 Anyways, when I run a pig job which accesses s3, it gets to 16%, does not
 fail in pig, but rather fails in mapreduce with Error:
 java.io.IOException: No FileSystem for scheme: s3n.

 I have added [hadoop-install-loc]/lib and
 [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env
 variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail
 at 0% (before it ever gets to mapreduce) with a very similar No fileystem
 for scheme s3n error.

 I feel like at this point I just have to add the share/hadoop/tools/lib
 directory (and maybe lib) to the right environment variable, but I can't
 figure out which environment variable that should be.

 I appreciate any help, thanks!!


 Stack trace:
 org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
 at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
 at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at
 org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at
 org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at
 org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at
 org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498)
 at
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467)
 at
 org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103)
 at
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.init(MapTask.java:512)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at
 org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
 org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at
 java.security.AccessController.doPrivileged(Native Method) at
 javax.security.auth.Subject.doAs(Subject.java:415) at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)


 — Billy Watson

 --
  William Watson
 Software Engineer
 (904) 705-7056 PCS




Re: Unable to Find S3N Filesystem Hadoop 2.6

2015-04-20 Thread Billy Watson
We found the correct configs.

This post was helpful, but didn't entirely work for us out of the box since
we are using hadoop-pseudo-distributed.
http://hortonworks.com/community/forums/topic/s3n-error-for-hdp-2-2/

We added a property to the core-site.xml file:

  property
namefs.s3n.impl/name
valueorg.apache.hadoop.fs.s3native.NativeS3FileSystem/value
descriptionTell hadoop which class to use to access s3 URLs. This
change became necessary in hadoop 2.6.0/description
  /property

And updated the classpath for mapreduce applications:

  property
namemapreduce.application.classpath/name

value$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/tools/lib/*/value
descriptionThe classpath specifically for mapreduce jobs. This
override is nec. so that s3n URLs work on hadoop 2.6.0+/description
  /property

William Watson
Software Engineer
(904) 705-7056 PCS

On Mon, Apr 20, 2015 at 11:13 AM, Billy Watson williamrwat...@gmail.com
wrote:

 Thanks, anyways. Anyone else run into this issue?

 William Watson
 Software Engineer
 (904) 705-7056 PCS

 On Mon, Apr 20, 2015 at 11:11 AM, Jonathan Aquilina 
 jaquil...@eagleeyet.net wrote:

 Sadly I'll have to pull back I have only run a Hadoop map reduce cluster
 with Amazon met

 Sent from my iPhone

 On 20 Apr 2015, at 16:53, Billy Watson williamrwat...@gmail.com wrote:

 This is an install on a CentOS 6 virtual machine used in our test
 environment. We use HDP in staging and production and we discovered these
 issues while trying to build a new cluster using HDP 2.2 which upgrades
 from Hadoop 2.4 to Hadoop 2.6.

 William Watson
 Software Engineer
 (904) 705-7056 PCS

 On Mon, Apr 20, 2015 at 10:26 AM, Jonathan Aquilina 
 jaquil...@eagleeyet.net wrote:

  One thing I think which i most likely missed completely is are you
 using an amazon EMR cluster or something in house?



 ---
 Regards,
 Jonathan Aquilina
 Founder Eagle Eye T

  On 2015-04-20 16:21, Billy Watson wrote:

 I appreciate the response. These JAR files aren't 3rd party. They're
 included with the Hadoop distribution, but in Hadoop 2.6 they stopped being
 loaded by default and now they have to be loaded manually, if needed.

 Essentially the problem boils down to:

 - need to access s3n URLs
 - cannot access without including the tools directory
 - after including tools directory in HADOOP_CLASSPATH, failures start
 happening later in job
 - need to find right env variable (or shell script or w/e) to include
 jets3t  other JARs needed to access s3n URLs (I think)



   William Watson
 Software Engineer
 (904) 705-7056 PCS

 On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina 
 jaquil...@eagleeyet.net wrote:

  you mention an environmental variable. the step before you specify
 the steps to run to get to the result. you can specify a bash script that
 will allow you to put any 3rd party jar files, for us we used esri, on the
 cluster and propagate them to all nodes in the cluster as well. You can
 ping me off list if you need further help. Thing is I havent used pig but
 my boss and coworker wrote the mappers and reducers. to get these jars to
 the entire cluster was a super small and simple bash script.



 ---
 Regards,
 Jonathan Aquilina
 Founder Eagle Eye T

   On 2015-04-20 15:17, Billy Watson wrote:

 Hi,

 I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command
 line without issue. I have set some options in hadoop-env.sh to make sure
 all the S3 stuff for hadoop 2.6 is set up correctly. (This was very
 confusing, BTW and not enough searchable documentation on changes to the s3
 stuff in hadoop 2.6 IMHO).

 Anyways, when I run a pig job which accesses s3, it gets to 16%, does
 not fail in pig, but rather fails in mapreduce with Error:
 java.io.IOException: No FileSystem for scheme: s3n.

 I have added [hadoop-install-loc]/lib and
 [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env
 variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail
 at 0% (before it ever gets to mapreduce) with a very similar No fileystem
 for scheme s3n error.

 I feel like at this point I just have to add the share/hadoop/tools/lib
 directory (and maybe lib) to the right environment variable, but I can't
 figure out which environment variable that should be.

 I appreciate any help, thanks!!


 Stack trace:
 org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
 at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
 at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at
 org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at
 org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at
 org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at
 org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498

Re: Unable to Find S3N Filesystem Hadoop 2.6

2015-04-20 Thread Billy Watson
Thanks, anyways. Anyone else run into this issue?

William Watson
Software Engineer
(904) 705-7056 PCS

On Mon, Apr 20, 2015 at 11:11 AM, Jonathan Aquilina jaquil...@eagleeyet.net
 wrote:

 Sadly I'll have to pull back I have only run a Hadoop map reduce cluster
 with Amazon met

 Sent from my iPhone

 On 20 Apr 2015, at 16:53, Billy Watson williamrwat...@gmail.com wrote:

 This is an install on a CentOS 6 virtual machine used in our test
 environment. We use HDP in staging and production and we discovered these
 issues while trying to build a new cluster using HDP 2.2 which upgrades
 from Hadoop 2.4 to Hadoop 2.6.

 William Watson
 Software Engineer
 (904) 705-7056 PCS

 On Mon, Apr 20, 2015 at 10:26 AM, Jonathan Aquilina 
 jaquil...@eagleeyet.net wrote:

  One thing I think which i most likely missed completely is are you
 using an amazon EMR cluster or something in house?



 ---
 Regards,
 Jonathan Aquilina
 Founder Eagle Eye T

  On 2015-04-20 16:21, Billy Watson wrote:

 I appreciate the response. These JAR files aren't 3rd party. They're
 included with the Hadoop distribution, but in Hadoop 2.6 they stopped being
 loaded by default and now they have to be loaded manually, if needed.

 Essentially the problem boils down to:

 - need to access s3n URLs
 - cannot access without including the tools directory
 - after including tools directory in HADOOP_CLASSPATH, failures start
 happening later in job
 - need to find right env variable (or shell script or w/e) to include
 jets3t  other JARs needed to access s3n URLs (I think)



   William Watson
 Software Engineer
 (904) 705-7056 PCS

 On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina 
 jaquil...@eagleeyet.net wrote:

  you mention an environmental variable. the step before you specify the
 steps to run to get to the result. you can specify a bash script that will
 allow you to put any 3rd party jar files, for us we used esri, on the
 cluster and propagate them to all nodes in the cluster as well. You can
 ping me off list if you need further help. Thing is I havent used pig but
 my boss and coworker wrote the mappers and reducers. to get these jars to
 the entire cluster was a super small and simple bash script.



 ---
 Regards,
 Jonathan Aquilina
 Founder Eagle Eye T

   On 2015-04-20 15:17, Billy Watson wrote:

 Hi,

 I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command
 line without issue. I have set some options in hadoop-env.sh to make sure
 all the S3 stuff for hadoop 2.6 is set up correctly. (This was very
 confusing, BTW and not enough searchable documentation on changes to the s3
 stuff in hadoop 2.6 IMHO).

 Anyways, when I run a pig job which accesses s3, it gets to 16%, does
 not fail in pig, but rather fails in mapreduce with Error:
 java.io.IOException: No FileSystem for scheme: s3n.

 I have added [hadoop-install-loc]/lib and
 [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env
 variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail
 at 0% (before it ever gets to mapreduce) with a very similar No fileystem
 for scheme s3n error.

 I feel like at this point I just have to add the share/hadoop/tools/lib
 directory (and maybe lib) to the right environment variable, but I can't
 figure out which environment variable that should be.

 I appreciate any help, thanks!!


 Stack trace:
 org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
 at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
 at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at
 org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at
 org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at
 org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at
 org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498)
 at
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467)
 at
 org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103)
 at
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.init(MapTask.java:512)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at
 org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
 org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at
 java.security.AccessController.doPrivileged(Native Method) at
 javax.security.auth.Subject.doAs(Subject.java:415) at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)


 — Billy Watson

 --
  William Watson
 Software Engineer