[jira] [Created] (YARN-7889) Missing kerberos token when check for RM REST API availability
Eric Yang created YARN-7889: --- Summary: Missing kerberos token when check for RM REST API availability Key: YARN-7889 URL: https://issues.apache.org/jira/browse/YARN-7889 Project: Hadoop YARN Issue Type: Bug Components: yarn-native-services Affects Versions: 3.1.0 Reporter: Eric Yang When checking for which resource manager can be used for REST API request, client side must send kerberos token to REST API end point. The checking mechanism is currently missing the kerberos token. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7888) container-log4j.properties is in hadoop-yarn-node-manager.jar
Haibo Chen created YARN-7888: Summary: container-log4j.properties is in hadoop-yarn-node-manager.jar Key: YARN-7888 URL: https://issues.apache.org/jira/browse/YARN-7888 Project: Hadoop YARN Issue Type: Bug Reporter: Haibo Chen NM sets up log4j for containers with the container-log4j.properties file in its own jar. However, ideally we should not expose server side jars to containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86
For more details, see https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/677/ [Feb 2, 2018 2:03:01 AM] (aengineer) HDFS-12942. Synchronization issue in FSDataSetImpl#moveBlock. [Feb 2, 2018 3:25:41 AM] (yqlin) HDFS-13068. RBF: Add router admin option to manage safe mode. [Feb 2, 2018 5:34:07 AM] (aajisaka) HDFS-13048. LowRedundancyReplicatedBlocks metric can be negative [Feb 2, 2018 5:33:26 PM] (jlowe) HADOOP-15170. Add symlink support to FileUtil#unTarUsingJava. [Feb 2, 2018 6:28:22 PM] (arun suresh) YARN-7839. Modify PlacementAlgorithm to Check node capacity before [Feb 2, 2018 7:10:47 PM] (jianhe) YARN-7868. Provide improved error message when YARN service is disabled. [Feb 2, 2018 7:37:51 PM] (arp) HADOOP-15198. Correct the spelling in CopyFilter.java. Contributed by [Feb 2, 2018 8:51:27 PM] (hanishakoneru) HADOOP-15168. Add kdiag tool to hadoop command. Contributed by Bharat [Feb 2, 2018 10:38:33 PM] (jianhe) YARN-7831. YARN Service CLI should use hadoop.http.authentication.type [Feb 2, 2018 10:46:20 PM] (kkaranasos) YARN-7778. Merging of placement constraints defined at different levels. [Feb 3, 2018 12:28:03 AM] (hanishakoneru) HDFS-13073. Cleanup code in InterQJournalProtocol.proto. Contributed by [Feb 3, 2018 12:48:57 AM] (szegedim) YARN-7879. NM user is unable to access the application filecache due to [Feb 3, 2018 1:18:42 AM] (weichiu) HDFS-11187. Optimize disk access for last partial chunk checksum of - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7887) SchedulingMonitor#PolicyInvoker catches all throwables
Young Chen created YARN-7887: Summary: SchedulingMonitor#PolicyInvoker catches all throwables Key: YARN-7887 URL: https://issues.apache.org/jira/browse/YARN-7887 Project: Hadoop YARN Issue Type: Bug Reporter: Young Chen SchedulingMonitor catches all Throwables. This prevents InvariantsCheckers from failing simulations when their invariants are violated. There should be some method to selectively enable SchedulingEditPolicies to propagate exceptions out of the SchedulingMonitor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7886) [GQ] Compare resource allocation achieved by rebalancing algorithms with single-cluster capacity scheduler allocation
Konstantinos Karanasos created YARN-7886: Summary: [GQ] Compare resource allocation achieved by rebalancing algorithms with single-cluster capacity scheduler allocation Key: YARN-7886 URL: https://issues.apache.org/jira/browse/YARN-7886 Project: Hadoop YARN Issue Type: Sub-task Reporter: Konstantinos Karanasos Assignee: Konstantinos Karanasos Given a federated cluster, this Jira will enable us to compare the allocation achieved by our rebalancing algorithms, when compared to the allocation that the Capacity Scheduler would achieve if it were operating over a single big cluster having the same total resources as the federated cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7885) [GQ] Generator for queue hierarchies over federated clusters
Konstantinos Karanasos created YARN-7885: Summary: [GQ] Generator for queue hierarchies over federated clusters Key: YARN-7885 URL: https://issues.apache.org/jira/browse/YARN-7885 Project: Hadoop YARN Issue Type: Sub-task Reporter: Konstantinos Karanasos Assignee: Konstantinos Karanasos This Jira will focus on generating random queue hierarchies with different total/used/pending resources across the sub-clusters of a federated cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7884) Race condition in registering YARN service in ZooKeeper
Eric Yang created YARN-7884: --- Summary: Race condition in registering YARN service in ZooKeeper Key: YARN-7884 URL: https://issues.apache.org/jira/browse/YARN-7884 Project: Hadoop YARN Issue Type: Bug Components: yarn-native-services Affects Versions: 3.1.0 Reporter: Eric Yang In Kerberos enabled cluster, there seems to be a race condition for registering YARN service. Yarn-service znode creation seems to happen after AM started and reporting back to update components information. For some reason, Yarnservice znode should have access to create the znode, but reported NoAuth. {code} 2018-02-02 22:53:30,442 [main] INFO service.ServiceScheduler - Set registry user accounts: sasl:hbase 2018-02-02 22:53:30,471 [main] INFO zk.RegistrySecurity - Registry default system acls: [1,s{'world,'anyone} , 31,s{'sasl,'yarn} , 31,s{'sasl,'jhs} , 31,s{'sasl,'hdfs-demo} , 31,s{'sasl,'rm} , 31,s{'sasl,'hive} ] 2018-02-02 22:53:30,472 [main] INFO zk.RegistrySecurity - Registry User ACLs [31,s{'sasl,'hbase} , 31,s{'sasl,'hbase} ] 2018-02-02 22:53:30,503 [main] INFO event.AsyncDispatcher - Registering class org.apache.hadoop.yarn.service.component.ComponentEventType for class org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler 2018-02-02 22:53:30,504 [main] INFO event.AsyncDispatcher - Registering class org.apache.hadoop.yarn.service.component.instance.ComponentInstanceEventType for class org.apache.hadoop.yarn.service.ServiceScheduler$ComponentInstanceEventHandler 2018-02-02 22:53:30,528 [main] INFO impl.NMClientAsyncImpl - Upper bound of the thread pool size is 500 2018-02-02 22:53:30,531 [main] INFO service.ServiceMaster - Starting service as user hbase/eyang-5.openstacklo...@example.com (auth:KERBEROS) 2018-02-02 22:53:30,545 [main] INFO ipc.CallQueueManager - Using callQueue: class java.util.concurrent.LinkedBlockingQueue queueCapacity: 100 scheduler: class org.apache.hadoop.ipc.DefaultRpcScheduler 2018-02-02 22:53:30,554 [Socket Reader #1 for port 56859] INFO ipc.Server - Starting Socket Reader #1 for port 56859 2018-02-02 22:53:30,589 [main] INFO pb.RpcServerFactoryPBImpl - Adding protocol org.apache.hadoop.yarn.service.impl.pb.service.ClientAMProtocolPB to the server 2018-02-02 22:53:30,606 [IPC Server Responder] INFO ipc.Server - IPC Server Responder: starting 2018-02-02 22:53:30,607 [IPC Server listener on 56859] INFO ipc.Server - IPC Server listener on 56859: starting 2018-02-02 22:53:30,607 [main] INFO service.ClientAMService - Instantiated ClientAMService at eyang-5.openstacklocal/172.26.111.20:56859 2018-02-02 22:53:30,609 [main] INFO zk.CuratorService - Creating CuratorService with connection fixed ZK quorum "eyang-1.openstacklocal:2181" 2018-02-02 22:53:30,615 [main] INFO zk.RegistrySecurity - Enabling ZK sasl client: jaasClientEntry = Client, principal = hbase/eyang-5.openstacklo...@example.com, keytab = /etc/security/keytabs/hbase.service.keytab 2018-02-02 22:53:30,752 [main] INFO client.RMProxy - Connecting to ResourceManager at eyang-1.openstacklocal/172.26.111.17:8032 2018-02-02 22:53:30,909 [main] INFO service.ServiceScheduler - Registering appattempt_1517611904996_0001_01, abc into registry 2018-02-02 22:53:30,911 [main] INFO service.ServiceScheduler - Received 0 containers from previous attempt. 2018-02-02 22:53:31,072 [main] INFO service.ServiceScheduler - Could not read component paths: `/users/hbase/services/yarn-service/abc/components': No such file or directory: KeeperErrorCode = NoNode for /registry/users/hbase/services/yarn-service/abc/components 2018-02-02 22:53:31,074 [main] INFO service.ServiceScheduler - Triggering initial evaluation of component sleeper 2018-02-02 22:53:31,075 [main] INFO component.Component - [INIT COMPONENT sleeper]: 2 instances. 2018-02-02 22:53:31,094 [main] INFO component.Component - [COMPONENT sleeper] Transitioned from INIT to FLEXING on FLEX event. 2018-02-02 22:53:31,215 [pool-5-thread-1] ERROR service.ServiceScheduler - Failed to register app abc in registry org.apache.hadoop.registry.client.exceptions.NoPathPermissionsException: `/registry/users/hbase/services/yarn-service/abc': Not authorized to access path; ACLs: [ 0x01: 'world,'anyone 0x1f: 'sasl,'yarn 0x1f: 'sasl,'jhs 0x1f: 'sasl,'hdfs-demo 0x1f: 'sasl,'rm 0x1f: 'sasl,'hive 0x1f: 'sasl,'hbase 0x1f: 'sasl,'hbase ]: KeeperErrorCode = NoAuth for /registry/users/hbase/services/yarn-service/abc at org.apache.hadoop.registry.client.impl.zk.CuratorService.operationFailure(CuratorService.java:412) at org.apache.hadoop.registry.client.impl.zk.CuratorService.zkCreate(CuratorService.java:637) at org.apache.hadoop.registry.client.impl.zk.CuratorService.zkSet(CuratorService.java:679) at
[jira] [Resolved] (YARN-7883) Make HAR tool support IndexedLogAggregtionController
[ https://issues.apache.org/jira/browse/YARN-7883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong resolved YARN-7883. - Resolution: Duplicate > Make HAR tool support IndexedLogAggregtionController > > > Key: YARN-7883 > URL: https://issues.apache.org/jira/browse/YARN-7883 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong >Priority: Major > > In https://issues.apache.org/jira/browse/MAPREDUCE-6415, we have created a > tool to combine aggregated logs into HAR files which currently only work for > TFileLogAggregationFileController. We should make it support > IndexedLogAggregtionController as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7883) Make HAR tool support IndexedLogAggregtionController
Xuan Gong created YARN-7883: --- Summary: Make HAR tool support IndexedLogAggregtionController Key: YARN-7883 URL: https://issues.apache.org/jira/browse/YARN-7883 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong In https://issues.apache.org/jira/browse/MAPREDUCE-6415, we have created a tool to combine aggregated logs into HAR files which currently only work for TFileLogAggregationFileController. We should make it support IndexedLogAggregtionController as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7882) Server side proxy for UI2 log viewer
Eric Yang created YARN-7882: --- Summary: Server side proxy for UI2 log viewer Key: YARN-7882 URL: https://issues.apache.org/jira/browse/YARN-7882 Project: Hadoop YARN Issue Type: Bug Components: security, timelineserver, yarn-ui-v2 Affects Versions: 3.0.0 Reporter: Eric Yang When viewing container logs in UI2, the log files are directly fetched through timeline server 2. Hadoop in simple security mode does not have authenticator to make sure the user is authorized to view the log. The general practice is to use knox or other security proxy to authenticate the user and reserve proxy the request to Hadoop UI to ensure the information does not leak through anonymous user. The current implementation of UI2 log viewer uses ajax code to timeline server 2. This could prevent knox or reverse proxy software from working properly with the new design. It would be good to perform server side proxy to prevent browser from side step the authentication check. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
Re: Apache Hadoop 3.0.1 Release plan
On Fri, Feb 2, 2018 at 10:22 AM, Arpit Agarwalwrote: > Do you plan to roll an RC with an uncommitted fix? That isn't the right > approach. The fix will be committed to the release branch. We'll vote on the release, and if it receives a majority of +1 votes then it becomes 3.0.1. That's how the PMC decides how to move forward. In this case, that will also resolve whether or not it can be committed to trunk. If this logic is unpersuasive, then we can require a 2/3 majority to replace the codebase. Either way, the PMC will vote to define the consensus view when it is not emergent. > This issue has good visibility and enough discussion. Yes, it has. We always prefer consensus to voting, but when discussion reveals that complete consensus is impossible, we still need a way forward. This is rare, and usually reserved for significant changes (like merging YARN). Frankly, it's embarrassing to resort to it here, but here we are. > If there is a binding veto in effect then the change must be abandoned. Else > you should be able to proceed with committing. However, 3.0.0 must be called > out as an abandoned release if we commit it. This is not accurate. A binding veto from any committer halts progress, but the PMC sets the direction of the project. That includes making decisions that are not universally accepted. -C > On 2/1/18, 3:01 PM, "Lei Xu" wrote: > > Sounds good to me, ATM. > > On Thu, Feb 1, 2018 at 2:34 PM, Aaron T. Myers wrote: > > Hey Anu, > > > > My feeling on HDFS-12990 is that we've discussed it quite a bit already > and > > it doesn't seem at this point like either side is going to budge. I'm > > certainly happy to have a phone call about it, but I don't expect that > we'd > > make much progress. > > > > My suggestion is that we simply include the patch posted to HDFS-12990 > in > > the 3.0.1 RC and call this issue out clearly in the subsequent VOTE > thread > > for the 3.0.1 release. Eddy, are you up for that? > > > > Best, > > Aaron > > > > On Thu, Feb 1, 2018 at 1:13 PM, Lei Xu wrote: > >> > >> +Xiao > >> > >> My understanding is that we will have this for 3.0.1. Xiao, could > >> you give your inputs here? > >> > >> On Thu, Feb 1, 2018 at 11:55 AM, Anu Engineer > > >> wrote: > >> > Hi Eddy, > >> > > >> > Thanks for driving this release. Just a quick question, do we have > time > >> > to close this issue? > >> > https://issues.apache.org/jira/browse/HDFS-12990 > >> > > >> > or are we abandoning it? I believe that this is the last window for > us > >> > to fix this issue. > >> > > >> > Should we have a call and get this resolved one way or another? > >> > > >> > Thanks > >> > Anu > >> > > >> > On 2/1/18, 10:51 AM, "Lei Xu" wrote: > >> > > >> > Hi, All > >> > > >> > I just cut branch-3.0.1 from branch-3.0. Please make sure all > >> > patches > >> > targeted to 3.0.1 being checked in both branch-3.0 and > branch-3.0.1. > >> > > >> > Thanks! > >> > Eddy > >> > > >> > On Tue, Jan 9, 2018 at 11:17 AM, Lei Xu > wrote: > >> > > Hi, All > >> > > > >> > > We have released Apache Hadoop 3.0.0 in December [1]. To > further > >> > > improve the quality of release, we plan to cut branch-3.0.1 > branch > >> > > tomorrow for the preparation of Apache Hadoop 3.0.1 release. > The > >> > focus > >> > > of 3.0.1 will be fixing blockers (3), critical bugs (1) and bug > >> > fixes > >> > > [2]. No new features and improvement should be included. > >> > > > >> > > We plan to cut branch-3.0.1 tomorrow (Jan 10th) and vote for > RC on > >> > Feb > >> > > 1st, targeting for Feb 9th release. > >> > > > >> > > Please feel free to share your insights. > >> > > > >> > > [1] > >> > https://www.mail-archive.com/general@hadoop.apache.org/msg07757.html > >> > > [2] https://issues.apache.org/jira/issues/?filter=12342842 > >> > > > >> > > Best, > >> > > -- > >> > > Lei (Eddy) Xu > >> > > Software Engineer, Cloudera > >> > > >> > > >> > > >> > -- > >> > Lei (Eddy) Xu > >> > Software Engineer, Cloudera > >> > > >> > > >> > - > >> > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > >> > For additional commands, e-mail: > common-dev-h...@hadoop.apache.org > >> > > >> > > >> > > >> > >> > >> > >> -- > >> Lei (Eddy) Xu > >> Software Engineer, Cloudera > >> > >>
Re: Apache Hadoop 3.0.1 Release plan
Hi Aaron/Lei, Do you plan to roll an RC with an uncommitted fix? That isn't the right approach. This issue has good visibility and enough discussion. If there is a binding veto in effect then the change must be abandoned. Else you should be able to proceed with committing. However, 3.0.0 must be called out as an abandoned release if we commit it. Regards, Arpit On 2/1/18, 3:01 PM, "Lei Xu"wrote: Sounds good to me, ATM. On Thu, Feb 1, 2018 at 2:34 PM, Aaron T. Myers wrote: > Hey Anu, > > My feeling on HDFS-12990 is that we've discussed it quite a bit already and > it doesn't seem at this point like either side is going to budge. I'm > certainly happy to have a phone call about it, but I don't expect that we'd > make much progress. > > My suggestion is that we simply include the patch posted to HDFS-12990 in > the 3.0.1 RC and call this issue out clearly in the subsequent VOTE thread > for the 3.0.1 release. Eddy, are you up for that? > > Best, > Aaron > > On Thu, Feb 1, 2018 at 1:13 PM, Lei Xu wrote: >> >> +Xiao >> >> My understanding is that we will have this for 3.0.1. Xiao, could >> you give your inputs here? >> >> On Thu, Feb 1, 2018 at 11:55 AM, Anu Engineer >> wrote: >> > Hi Eddy, >> > >> > Thanks for driving this release. Just a quick question, do we have time >> > to close this issue? >> > https://issues.apache.org/jira/browse/HDFS-12990 >> > >> > or are we abandoning it? I believe that this is the last window for us >> > to fix this issue. >> > >> > Should we have a call and get this resolved one way or another? >> > >> > Thanks >> > Anu >> > >> > On 2/1/18, 10:51 AM, "Lei Xu" wrote: >> > >> > Hi, All >> > >> > I just cut branch-3.0.1 from branch-3.0. Please make sure all >> > patches >> > targeted to 3.0.1 being checked in both branch-3.0 and branch-3.0.1. >> > >> > Thanks! >> > Eddy >> > >> > On Tue, Jan 9, 2018 at 11:17 AM, Lei Xu wrote: >> > > Hi, All >> > > >> > > We have released Apache Hadoop 3.0.0 in December [1]. To further >> > > improve the quality of release, we plan to cut branch-3.0.1 branch >> > > tomorrow for the preparation of Apache Hadoop 3.0.1 release. The >> > focus >> > > of 3.0.1 will be fixing blockers (3), critical bugs (1) and bug >> > fixes >> > > [2]. No new features and improvement should be included. >> > > >> > > We plan to cut branch-3.0.1 tomorrow (Jan 10th) and vote for RC on >> > Feb >> > > 1st, targeting for Feb 9th release. >> > > >> > > Please feel free to share your insights. >> > > >> > > [1] >> > https://www.mail-archive.com/general@hadoop.apache.org/msg07757.html >> > > [2] https://issues.apache.org/jira/issues/?filter=12342842 >> > > >> > > Best, >> > > -- >> > > Lei (Eddy) Xu >> > > Software Engineer, Cloudera >> > >> > >> > >> > -- >> > Lei (Eddy) Xu >> > Software Engineer, Cloudera >> > >> > >> > - >> > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org >> > For additional commands, e-mail: common-dev-h...@hadoop.apache.org >> > >> > >> > >> >> >> >> -- >> Lei (Eddy) Xu >> Software Engineer, Cloudera >> >> - >> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org >> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org >> > -- Lei (Eddy) Xu Software Engineer, Cloudera - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7881) Add Log Aggregation Status API to the RM Webservice
Gergely Novák created YARN-7881: --- Summary: Add Log Aggregation Status API to the RM Webservice Key: YARN-7881 URL: https://issues.apache.org/jira/browse/YARN-7881 Project: Hadoop YARN Issue Type: New Feature Components: yarn Reporter: Gergely Novák Assignee: Gergely Novák The old YARN UI has a page: /cluster/logaggregationstatus/\{app_id} which shows the log aggregation status for all the nodes that run containers for the given application. This information is not yet available by the RM Rest API. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-7832) Logs page does not work for Running applications
[ https://issues.apache.org/jira/browse/YARN-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G resolved YARN-7832. --- Resolution: Not A Problem Thanks [~yeshavora] for confirming, This is working fine with Combine System Metric Publisher mode > Logs page does not work for Running applications > > > Key: YARN-7832 > URL: https://issues.apache.org/jira/browse/YARN-7832 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Affects Versions: 3.0.0 >Reporter: Yesha Vora >Assignee: Sunil G >Priority: Critical > Attachments: Screen Shot 2018-01-26 at 3.28.40 PM.png, > YARN-7832.001.patch > > > Scenario > * Run yarn service application > * When application is Running, go to log page > * Select AttemptId and Container Id > Logs are not showed on UI. It complains "No log data available!" > > Here > [http://xxx:8188/ws/v1/applicationhistory/containers/container_e07_1516919074719_0004_01_01/logs?_=1517009230358] > API fails with 500 Internal Server Error. > {"exception":"WebApplicationException","message":"java.io.IOException: > ","javaClassName":"javax.ws.rs.WebApplicationException"} > {code:java} > GET > http://xxx:8188/ws/v1/applicationhistory/containers/container_e07_1516919074719_0004_01_01/logs?_=1517009230358 > 500 (Internal Server Error) > (anonymous) @ VM779:1 > send @ vendor.js:572 > ajax @ vendor.js:548 > (anonymous) @ vendor.js:5119 > initializePromise @ vendor.js:2941 > Promise @ vendor.js:3005 > ajax @ vendor.js:5117 > ajax @ yarn-ui.js:1 > superWrapper @ vendor.js:1591 > query @ vendor.js:5112 > ember$data$lib$system$store$finders$$_query @ vendor.js:5177 > query @ vendor.js:5334 > fetchLogFilesForContainerId @ yarn-ui.js:132 > showLogFilesForContainerId @ yarn-ui.js:126 > run @ vendor.js:648 > join @ vendor.js:648 > run.join @ vendor.js:1510 > closureAction @ vendor.js:1865 > trigger @ vendor.js:302 > (anonymous) @ vendor.js:339 > each @ vendor.js:61 > each @ vendor.js:51 > trigger @ vendor.js:339 > d.select @ vendor.js:5598 > (anonymous) @ vendor.js:5598 > d.invoke @ vendor.js:5598 > d.trigger @ vendor.js:5598 > e.trigger @ vendor.js:5598 > (anonymous) @ vendor.js:5598 > d.invoke @ vendor.js:5598 > d.trigger @ vendor.js:5598 > (anonymous) @ vendor.js:5598 > dispatch @ vendor.js:306 > elemData.handle @ vendor.js:281{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7880) FiCaSchedulerApp.commonCheckContainerAllocation throws NPE when running sls
Jiandan Yang created YARN-7880: --- Summary: FiCaSchedulerApp.commonCheckContainerAllocation throws NPE when running sls Key: YARN-7880 URL: https://issues.apache.org/jira/browse/YARN-7880 Project: Hadoop YARN Issue Type: Bug Reporter: Jiandan Yang 18/02/02 20:54:28 INFO rmcontainer.RMContainerImpl: container_1517575125794_5707_01_86 Container Transitioned from ACQUIRED to RUNNING java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.commonCheckContainerAllocation(FiCaSchedulerApp.java:324) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.accept(FiCaSchedulerApp.java:420) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2506) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:541) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7879) NM user is unable to access the application filecache due to permissions
Shane Kumpf created YARN-7879: - Summary: NM user is unable to access the application filecache due to permissions Key: YARN-7879 URL: https://issues.apache.org/jira/browse/YARN-7879 Project: Hadoop YARN Issue Type: Bug Reporter: Shane Kumpf I noticed the following log entries where localization was being retried on several MR AM files. {code} 2018-02-02 02:53:02,905 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl: Resource /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/11/job.jar is missing, localizing it again 2018-02-02 02:53:42,908 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl: Resource /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/13/job.xml is missing, localizing it again {code} The cluster is configured to use LCE and {{yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user}} is set to a user ({{hadoopuser}}) that is in the {{hadoop}} group. The user has a umask of {{0002}}. The cluser is configured with {{fs.permissions.umask-mode=022}}, coming from {{core-default}}. Setting the local-user to {{nobody}}, who is not a login user or in the {{hadoop}} group, produces the same results. {code} [hadoopuser@y7001 ~]$ umask 0002 [hadoopuser@y7001 ~]$ id uid=1003(hadoopuser) gid=1004(hadoopuser) groups=1004(hadoopuser),1001(hadoop) {code} The cause of the log entry was tracked down a simple !file.exists call in {{LocalResourcesTrackerImpl#isResourcePresent}}. {code} public boolean isResourcePresent(LocalizedResource rsrc) { boolean ret = true; if (rsrc.getState() == ResourceState.LOCALIZED) { File file = new File(rsrc.getLocalPath().toUri().getRawPath(). toString()); if (!file.exists()) { ret = false; } else if (dirsHandler != null) { ret = checkLocalResource(rsrc); } } return ret; } {code} The Resources Tracker runs as the NM user, in this case {{yarn}}. The files being retried are in the filecache. The directories in the filecache are all owned by the local-user's primary group and 700 perms, which makes it unreadable by the {{yarn}} user. {code} [root@y7001 ~]# ls -la /hadoop-yarn/usercache/hadoopuser/appcache/application_1517540536531_0001/filecache total 0 drwx--x---. 6 hadoopuser hadoop 46 Feb 2 03:06 . drwxr-s---. 4 hadoopuser hadoop 73 Feb 2 03:07 .. drwx--. 2 hadoopuser hadoopuser 61 Feb 2 03:05 10 drwx--. 3 hadoopuser hadoopuser 21 Feb 2 03:05 11 drwx--. 2 hadoopuser hadoopuser 45 Feb 2 03:06 12 drwx--. 2 hadoopuser hadoopuser 41 Feb 2 03:06 13 {code} I saw YARN-5287, but that appears to be related to a restrictive umask and the usercache itself. I was unable to locate any other known issues that seemed relevent. Is the above already known? a configuration issue? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org