[jira] [Created] (YARN-6323) Rolling upgrade/config change is broken on timeline v2.
Li Lu created YARN-6323: --- Summary: Rolling upgrade/config change is broken on timeline v2. Key: YARN-6323 URL: https://issues.apache.org/jira/browse/YARN-6323 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Found this issue when deploying on real clusters. If there are apps running when we enable timeline v2 (with work preserving restart enabled), node managers will fail to start due to missing app context data. We should probably assign some default names to these "left over" apps. I believe it's suboptimal to let users clean up the whole cluster before enabling timeline v2. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6316) Provide help information and documentation for TimelineSchemaCreator
Li Lu created YARN-6316: --- Summary: Provide help information and documentation for TimelineSchemaCreator Key: YARN-6316 URL: https://issues.apache.org/jira/browse/YARN-6316 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Right now there is no help information for timeline schema creator. We may probably want to provide an option to print help. Also, ideally, if users passed in no argument, we may want to print out help, instead of directly create the tables. This will simplify cluster operations and timeline v2 deployments. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6293) Investigate Java 7 compatibility for new YARN UI
Li Lu created YARN-6293: --- Summary: Investigate Java 7 compatibility for new YARN UI Key: YARN-6293 URL: https://issues.apache.org/jira/browse/YARN-6293 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Right now when trying the YARN new UI with Java 7, I can get the following warning: {code} [INFO] --- maven-enforcer-plugin:1.4.1:enforce (dist-enforce) @ hadoop-yarn-ui --- [WARNING] Rule 1: org.apache.maven.plugins.enforcer.RequireJavaVersion failed with message: Detected JDK Version: 1.7.0-67 is not in the allowed range [1.8,). {code} While right now this warning does not cause any troubles for trunk integration, when some users would like to package the new UI with some branch-2 based code, the JDK requirement would block the effort. So the question here is, is there any specific component in new UI codebase that prevent us using Java 7? I remember it should be a JS based implementation, right? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6228) EntityGroupFSTimelineStore should allow configurable cache stores.
Li Lu created YARN-6228: --- Summary: EntityGroupFSTimelineStore should allow configurable cache stores. Key: YARN-6228 URL: https://issues.apache.org/jira/browse/YARN-6228 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Assignee: Li Lu We should allow users to config which cache store to use for EntityGroupFSTimelineStore. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6030) Eliminate timelineServiceV2 boolean flag in TimelineClientImpl
Li Lu created YARN-6030: --- Summary: Eliminate timelineServiceV2 boolean flag in TimelineClientImpl Key: YARN-6030 URL: https://issues.apache.org/jira/browse/YARN-6030 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: YARN-5355 Reporter: Li Lu Priority: Minor I just discovered that we're still using a boolean flag {{timelineServiceV2}} after we introduced {{timelineServiceVersion}}. This sounds a little bit error-pruning. After the discussion I think we should only use and trust {{timelineServiceVersion}}. {{timelineServiceV2}} is set upon client creation. Instead of creating a v2 client and set this flag, maybe we'd like to do some sanity check and make sure the creation call is consistent with the configuration? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5974) Remove direct reference to TimelineClientImpl
Li Lu created YARN-5974: --- Summary: Remove direct reference to TimelineClientImpl Key: YARN-5974 URL: https://issues.apache.org/jira/browse/YARN-5974 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: YARN-5355 Reporter: Li Lu [~sjlee0]'s quick audit shows that things that are referencing TimelineClientImpl directly today: JobHistoryFileReplayMapperV1 (MR) SimpleEntityWriterV1 (MR) TestDistributedShell (DS) TestDSAppMaster (DS) TestNMTimelinePublisher (node manager) TestTimelineWebServicesWithSSL (AHS) This is not the right way to use TimelineClient and we should avoid direct reference to TimelineClientImpl as much as possible. Any newcomers to the community are more than welcome to take this. If this remains unassigned for ~24hrs I'll jump in and do a quick fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5780) [YARN native service] Allowing YARN native services to post data to timeline service V.2
Li Lu created YARN-5780: --- Summary: [YARN native service] Allowing YARN native services to post data to timeline service V.2 Key: YARN-5780 URL: https://issues.apache.org/jira/browse/YARN-5780 Project: Hadoop YARN Issue Type: New Feature Reporter: Li Lu Assignee: Li Lu The basic end-to-end workflow of timeline service v.2 has been merged into trunk. In YARN native services, we would like to post some service-specific data to timeline v.2. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-4220) [Storage implementation] Support getEntities with only Application id but no userId
[ https://issues.apache.org/jira/browse/YARN-4220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu resolved YARN-4220. - Resolution: Won't Fix Finished UI POC and we no longer require these APIs. Close as won't fix for now. > [Storage implementation] Support getEntities with only Application id but no > userId > --- > > Key: YARN-4220 > URL: https://issues.apache.org/jira/browse/YARN-4220 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu >Priority: Minor > Labels: YARN-5355 > > Currently we're enforcing flow and flowrun id to be non-null values on > {{getEntities}}. We can actually query the appToFlow table to figure out an > application's flow id and flowrun id if they're missing. This will simplify > normal queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5747) Application timeline metric aggregation in timeline v2 will lost last round aggregation when an application finishes
Li Lu created YARN-5747: --- Summary: Application timeline metric aggregation in timeline v2 will lost last round aggregation when an application finishes Key: YARN-5747 URL: https://issues.apache.org/jira/browse/YARN-5747 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Li Lu Assignee: Li Lu As discussed in YARN-3816, when an application finishes we should perform an extra round of application level timeline aggregation. Otherwise data posted after the last round of aggregation will get lost. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-3914) Entity created time should be part of the row key of entity table
[ https://issues.apache.org/jira/browse/YARN-3914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu resolved YARN-3914. - Resolution: Won't Fix Close this issue as discussed before. The community reached an agreement to not to move forward on this issue. BTW, part of the problems raised in this issue can be addressed by the entity prefix design proposed in YARN-5715. > Entity created time should be part of the row key of entity table > - > > Key: YARN-3914 > URL: https://issues.apache.org/jira/browse/YARN-3914 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Labels: YARN-5355 > > Entity created time should be part of the row key of entity table, between > entity type and entity Id. The reason to have it is to index the entities. > Though we cannot index the entities for all kinds of information, indexing > them according to the created time is very necessary. Without it, every query > for the latest entities that belong to an application and a type will scan > through all the entities that belong to them. For example, if we want to list > the 100 latest started containers in an YARN app. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5739) Provide timeline reader API to list available timeline entity types for one application
Li Lu created YARN-5739: --- Summary: Provide timeline reader API to list available timeline entity types for one application Key: YARN-5739 URL: https://issues.apache.org/jira/browse/YARN-5739 Project: Hadoop YARN Issue Type: Sub-task Components: timelinereader Reporter: Li Lu Assignee: Li Lu Right now we only show a part of available timeline entity data in the new YARN UI. However, some data (especially library specific data) are not possible to be queried out by the web UI. It will be appealing for the UI to provide an "entity browser" for each YARN application. Actually, simply dumping out available timeline entities (with proper pagination, of course) would be pretty helpful for UI users. On timeline side, we're not far away from this goal. Right now I believe the only thing missing is to list all available entity types within one application. The challenge here is that we're not storing this data for each application, but given this kind of call is relatively rare (compare to writes and updates) we can perform some scanning during the read time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5681) Change collector discovery to support collectors mapped to clients but not applications
Li Lu created YARN-5681: --- Summary: Change collector discovery to support collectors mapped to clients but not applications Key: YARN-5681 URL: https://issues.apache.org/jira/browse/YARN-5681 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Assignee: Li Lu As discussed in YARN-3981, we need the service discovery mechanism to map collectors with their actual "ids", which may or may not be a concrete application Id. This JIRA proposes to generalize the concept of collector id in collector service discovery mechanism. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5638) Introduce a collector Id to uniquely identify collectors and their creation order
Li Lu created YARN-5638: --- Summary: Introduce a collector Id to uniquely identify collectors and their creation order Key: YARN-5638 URL: https://issues.apache.org/jira/browse/YARN-5638 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Li Lu Assignee: Li Lu As discussed in YARN-3359, we need to further identify timeline collectors and their creation order for better service discovery and resource isolation. This JIRA proposes to use to accurately identify each timeline collector. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-5629) Persist collector discovery information to support RM HA
[ https://issues.apache.org/jira/browse/YARN-5629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu resolved YARN-5629. - Resolution: Duplicate Duplicate to YARN-3359. > Persist collector discovery information to support RM HA > > > Key: YARN-5629 > URL: https://issues.apache.org/jira/browse/YARN-5629 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > > As discussed in YARN-3039, we deliberately delayed the work to persist > collector discovery information. However, this feature becomes a blocker if > we want to run timeline v2 on a HAed cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5629) Persist collector discovery information to support RM HA
Li Lu created YARN-5629: --- Summary: Persist collector discovery information to support RM HA Key: YARN-5629 URL: https://issues.apache.org/jira/browse/YARN-5629 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Li Lu Assignee: Li Lu As discussed in YARN-3039, we deliberately delayed the work to persist collector discovery information. However, this feature becomes a blocker if we want to run timeline v2 on a HAed cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-5291) Store node information for finished containers in timeline v2
[ https://issues.apache.org/jira/browse/YARN-5291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu resolved YARN-5291. - Resolution: Invalid I overlooked the info field in container entities. The required information is already included. > Store node information for finished containers in timeline v2 > - > > Key: YARN-5291 > URL: https://issues.apache.org/jira/browse/YARN-5291 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5291) Store node information for finished containers in timeline v2
Li Lu created YARN-5291: --- Summary: Store node information for finished containers in timeline v2 Key: YARN-5291 URL: https://issues.apache.org/jira/browse/YARN-5291 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-5232) Support for specifying a path for ATS plugin jars
[ https://issues.apache.org/jira/browse/YARN-5232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu resolved YARN-5232. - Resolution: Duplicate JIRA problem, close the duplicated issue. > Support for specifying a path for ATS plugin jars > - > > Key: YARN-5232 > URL: https://issues.apache.org/jira/browse/YARN-5232 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: 2.8.0 >Reporter: Li Lu >Assignee: Li Lu > > Third-party plugins need to add their jars to ATS. Most of the times, > isolation is not needed. However, there needs to be a way to specify the > path. For now, the jars on that path can be added to default classloader. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5232) Support for specifying a path for ATS plugin jars
Li Lu created YARN-5232: --- Summary: Support for specifying a path for ATS plugin jars Key: YARN-5232 URL: https://issues.apache.org/jira/browse/YARN-5232 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.8.0 Reporter: Li Lu Assignee: Li Lu Third-party plugins need to add their jars to ATS. Most of the times, isolation is not needed. However, there needs to be a way to specify the path. For now, the jars on that path can be added to default classloader. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5233) Support for specifying a path for ATS plugin jars
Li Lu created YARN-5233: --- Summary: Support for specifying a path for ATS plugin jars Key: YARN-5233 URL: https://issues.apache.org/jira/browse/YARN-5233 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.8.0 Reporter: Li Lu Assignee: Li Lu Third-party plugins need to add their jars to ATS. Most of the times, isolation is not needed. However, there needs to be a way to specify the path. For now, the jars on that path can be added to default classloader. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-5138) fix "no findbugs output file" error for hadoop-yarn-server-timelineservice-hbase-tests
[ https://issues.apache.org/jira/browse/YARN-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu resolved YARN-5138. - Resolution: Fixed > fix "no findbugs output file" error for > hadoop-yarn-server-timelineservice-hbase-tests > -- > > Key: YARN-5138 > URL: https://issues.apache.org/jira/browse/YARN-5138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Vrushali C > Fix For: YARN-2928 > > Attachments: YARN-5138-YARN-2928.01.patch > > > For package hadoop-yarn-server-timelineservice-hbase-tests, mvn does not > generate findbugs xml files presently. > The reason being there was an issue where findbugs would not generate > anything for projects with only test classes > http://mojo.10943.n7.nabble.com/jira-Created-MFINDBUGS-132-Findbugs-doesn-t-run-on-projects-containing-only-test-classes-td13364.html > which was fixed in findbugs release 2.3.2. > But that requires "includeTests" parameter set to invoke findbugs. > Filing jira to discuss if we need findbugs for this package or whether we > should fix it to explicitly not invoke findbugs for this. > Related jira discussion YARN-5097 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5156) YARN_CONTAINER_FINISHED of YARN_CONTAINERs will always have running state
Li Lu created YARN-5156: --- Summary: YARN_CONTAINER_FINISHED of YARN_CONTAINERs will always have running state Key: YARN-5156 URL: https://issues.apache.org/jira/browse/YARN-5156 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu On container finished, we're reporting "YARN_CONTAINER_STATE: "RUNNING"". Do we design this deliberately or it's a bug? {code} { metrics: [ ], events: [ { id: "YARN_CONTAINER_FINISHED", timestamp: 1464213765890, info: { YARN_CONTAINER_EXIT_STATUS: 0, YARN_CONTAINER_STATE: "RUNNING", YARN_CONTAINER_DIAGNOSTICS_INFO: "" } }, { id: "YARN_NM_CONTAINER_LOCALIZATION_FINISHED", timestamp: 1464213761133, info: { } }, { id: "YARN_CONTAINER_CREATED", timestamp: 1464213761132, info: { } }, { id: "YARN_NM_CONTAINER_LOCALIZATION_STARTED", timestamp: 1464213761132, info: { } } ], id: "container_e15_1464213707405_0001_01_18", type: "YARN_CONTAINER", createdtime: 1464213761132, info: { YARN_CONTAINER_ALLOCATED_PRIORITY: "20", YARN_CONTAINER_ALLOCATED_VCORE: 1, YARN_CONTAINER_ALLOCATED_HOST_HTTP_ADDRESS: "10.22.16.164:0", UID: "yarn_cluster!application_1464213707405_0001!YARN_CONTAINER!container_e15_1464213707405_0001_01_18", YARN_CONTAINER_ALLOCATED_HOST: "10.22.16.164", YARN_CONTAINER_ALLOCATED_MEMORY: 1024, SYSTEM_INFO_PARENT_ENTITY: { type: "YARN_APPLICATION_ATTEMPT", id: "appattempt_1464213707405_0001_01" }, YARN_CONTAINER_ALLOCATED_PORT: 64694 }, configs: { }, isrelatedto: { }, relatesto: { } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5050) Code cleanup for TestDistributedShell
Li Lu created YARN-5050: --- Summary: Code cleanup for TestDistributedShell Key: YARN-5050 URL: https://issues.apache.org/jira/browse/YARN-5050 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Assignee: Li Lu We introduced some small errors after yesterday's rebase. Also, some timeout settings for timeline v2 tests are deprecated since we introduced global time out settings in YARN-4545. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5018) Online aggregation logic should not run immediately after collectors got started
Li Lu created YARN-5018: --- Summary: Online aggregation logic should not run immediately after collectors got started Key: YARN-5018 URL: https://issues.apache.org/jira/browse/YARN-5018 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Assignee: Li Lu In app level collector, we launch the aggregation logic immediately after the collector got started. However, at this time, important context data has yet to be published to the container. Also, if the aggregation result is empty, we do not need to publish them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-4987) EntityGroupFS timeline store needs to handle null storage gracefully
Li Lu created YARN-4987: --- Summary: EntityGroupFS timeline store needs to handle null storage gracefully Key: YARN-4987 URL: https://issues.apache.org/jira/browse/YARN-4987 Project: Hadoop YARN Issue Type: Bug Reporter: Li Lu Assignee: Li Lu To handle concurrency issues, key value based timeline storage may return null on reads that are concurrent to service stop. EntityGroupFS timeline store needs to handle this case gracefully. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4983) JVM and UGI metrics disappear after RM is once transitioned to standby mode
Li Lu created YARN-4983: --- Summary: JVM and UGI metrics disappear after RM is once transitioned to standby mode Key: YARN-4983 URL: https://issues.apache.org/jira/browse/YARN-4983 Project: Hadoop YARN Issue Type: Bug Reporter: Li Lu Assignee: Li Lu When get transitioned to standby, the RM will shutdown the existing metric system and relaunch a new one. This will cause the jvm metrics and ugi metrics to miss in the new metric system. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4978) The number of javadocs warnings is limited to 100
Li Lu created YARN-4978: --- Summary: The number of javadocs warnings is limited to 100 Key: YARN-4978 URL: https://issues.apache.org/jira/browse/YARN-4978 Project: Hadoop YARN Issue Type: Bug Reporter: Li Lu We are generating a lot of javadoc warnings with jdk 1.8. Right now the number is limited to 100. Enlarge this limitation can probably reveal more problems in one batch for our javadoc generation process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4977) Fix javadocs warnings in yarn-api for jdk 1.8
Li Lu created YARN-4977: --- Summary: Fix javadocs warnings in yarn-api for jdk 1.8 Key: YARN-4977 URL: https://issues.apache.org/jira/browse/YARN-4977 Project: Hadoop YARN Issue Type: Bug Reporter: Li Lu I can see there are a lot of javadoc warnings for YARN codebase if ran through jdk 1.8. We need to fix them. Some samples: if we ran mvn javadoc:javadoc on yarn-api, here's part of my sample outputs: {code} 100 warnings [WARNING] Javadoc Warnings [WARNING] Picked up JAVA_TOOL_OPTIONS: -Djava.awt.headless=true [WARNING] /Users/llu/hadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationBaseProtocol.java:97: warning: no description for @throws [WARNING] * @throws YarnException [WARNING] ^ [WARNING] /Users/llu/hadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationBaseProtocol.java:98: warning: no description for @throws [WARNING] * @throws IOException [WARNING] ^ [WARNING] /Users/llu/hadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationBaseProtocol.java:129: warning: no description for @throws [WARNING] * @throws YarnException [WARNING] ^ [WARNING] /Users/llu/hadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationBaseProtocol.java:130: warning: no description for @throws [WARNING] * @throws IOException [WARNING] ^ [WARNING] /Users/llu/hadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationBaseProtocol.java:169: warning: no description for @throws [WARNING] * @throws YarnException [WARNING] ^ [WARNING] /Users/llu/hadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationBaseProtocol.java:170: warning: no description for @throws [WARNING] * @throws IOException [WARNING] ^ [WARNING] /Users/llu/hadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationBaseProtocol.java:202: warning: no description for @throws [WARNING] * @throws YarnException [WARNING] ^ [WARNING] /Users/llu/hadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationBaseProtocol.java:203: warning: no description for @throws [WARNING] * @throws IOException [WARNING] ^ [WARNING] /Users/llu/hadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationBaseProtocol.java:237: warning: no description for @throws [WARNING] * @throws YarnException [WARNING] ^ [WARNING] /Users/llu/hadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationBaseProtocol.java:238: warning: no description for @throws [WARNING] * @throws IOException [WARNING] ^ [WARNING] /Users/llu/hadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationBaseProtocol.java:274: warning: no description for @throws [WARNING] * @throws YarnException [WARNING] ^ [WARNING] /Users/llu/hadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationBaseProtocol.java:275: warning: no description for @throws [WARNING] * @throws IOException [WARNING] ^ [WARNING] /Users/llu/hadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationBaseProtocol.java:296: warning: no description for @throws [WARNING] * @throws YarnException [WARNING] ^ [WARNING] /Users/llu/hadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationBaseProtocol.java:297: warning: no description for @throws [WARNING] * @throws IOException [WARNING] ^ [WARNING] /Users/llu/hadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationBaseProtocol.java:311: warning: no description for @throws [WARNING] * @throws YarnException [WARNING] ^ [WARNING] /Users/llu/hadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationBaseProtocol.java:312: warning: no description for @throws [WARNING] * @throws IOException [WARNING] ^ [WARNING] /Users/llu/hadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationBaseProtocol.java:326: warning: no description for @throws [WARNING] * @throws YarnException [WARNING] ^ [WARNING] /Users/llu/hadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationBaseProtocol.java:327: warning: no description for @throws [WARNING] * @throws IOException [WARNING] ^ [WARNING] /Users/llu/hadoop-commo
[jira] [Created] (YARN-4921) Remove deprecated "yarn.timeline-service.hostname" from yarn-default
Li Lu created YARN-4921: --- Summary: Remove deprecated "yarn.timeline-service.hostname" from yarn-default Key: YARN-4921 URL: https://issues.apache.org/jira/browse/YARN-4921 Project: Hadoop YARN Issue Type: Bug Reporter: Li Lu -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4886) Add HDFS caller context for EntityGroupFSTimelineStore
Li Lu created YARN-4886: --- Summary: Add HDFS caller context for EntityGroupFSTimelineStore Key: YARN-4886 URL: https://issues.apache.org/jira/browse/YARN-4886 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Assignee: Li Lu We need to add a HDFS caller context for the entity group FS storage for better audit log debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4851) Metric improvements for ATS v1.5 storage
Li Lu created YARN-4851: --- Summary: Metric improvements for ATS v1.5 storage Key: YARN-4851 URL: https://issues.apache.org/jira/browse/YARN-4851 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Assignee: Li Lu We can add more metrics to the ATS v1.5 storage systems, including purging, cache hit/misses, read latency, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-4716) TimelineClient to implement Flushable; propagate to writer
[ https://issues.apache.org/jira/browse/YARN-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu resolved YARN-4716. - Resolution: Duplicate Fixed in YARN-4696. > TimelineClient to implement Flushable; propagate to writer > -- > > Key: YARN-4716 > URL: https://issues.apache.org/jira/browse/YARN-4716 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: 2.8.0 >Reporter: Steve Loughran > > I need a {{flush()}} operation in the timeline client. Knowing the lifecycle > of my app, I do explicitly want to flush up events at certain points (app > start event, app end, without relying on an async flush client) > Right now, in tests, the time it takes for an event to propagate is: flush > delay + scan delay, you are looking at 2 seconds minimum, per test case, with > less deterministic outcomes. > In production, those big app lifecycle events are so important my client code > currently explicitly flushes my own event queue, and expects them to reach > the destination > With the filesystem writer, I've lost those durability guarantees. > Implementing {{Flushable.flush()}} would let my app tell the timeline client > to write; it'd be a no-op on the http client, but for the filesystem writer > (or any other async writer), it'd be expected to force a write to the durable > medium. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-4695) EntityGroupFSTimelineStore to not log errors during shutdown
[ https://issues.apache.org/jira/browse/YARN-4695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu resolved YARN-4695. - Resolution: Duplicate Fixed in YARN-4696. > EntityGroupFSTimelineStore to not log errors during shutdown > > > Key: YARN-4695 > URL: https://issues.apache.org/jira/browse/YARN-4695 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: 2.8.0 >Reporter: Steve Loughran >Assignee: Li Lu > > # The {{EntityGroupFSTimelineStore}} threads log exceptions that get raised > during their execution. > # the service stops by interrupting all its workers > # as a result, the workers all log exceptions at error *even during a managed > shutdown* > # this creates distracting noise in logs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-4780) TestDistributedShell breaks in trunk
[ https://issues.apache.org/jira/browse/YARN-4780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu resolved YARN-4780. - Resolution: Cannot Reproduce Oops, I can no longer reproduce this after a machine restart... Seems like a false alarm. > TestDistributedShell breaks in trunk > > > Key: YARN-4780 > URL: https://issues.apache.org/jira/browse/YARN-4780 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell, test >Reporter: Li Lu > > When testing TestDistributedShell#testDSShellWithDomain and > TestDistributedShell#testDSShellWithoutDomain, I got test failures because > they've been timed out. From the application log I can see the following > lines: > {code} > ... > 016-03-08 14:51:39,642 INFO [main] impl.TimelineClientImpl > (TimelineClientImpl.java:serviceInit(296)) - Timeline service address: > http://hw11074.local:64448/ws/v1/timeline/ > 2016-03-08 14:51:39,664 INFO [main] impl.TimelineClientImpl > (TimelineClientImpl.java:logException(212)) - Exception caught by > TimelineClientConnectionRetry, will try 30 more t > ime(s). > Message: java.net.ConnectException: Connection refused > ... > {code} > I checked the test log that the timeline server is launched on the correct > port. Several other folks can reproduce this problem as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4780) TestDistributedShell breaks in trunk
Li Lu created YARN-4780: --- Summary: TestDistributedShell breaks in trunk Key: YARN-4780 URL: https://issues.apache.org/jira/browse/YARN-4780 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell, test Reporter: Li Lu When testing TestDistributedShell#testDSShellWithDomain and TestDistributedShell#testDSShellWithoutDomain, I got test failures because they've been timed out. From the application log I can see the following lines: {code} ... 016-03-08 14:51:39,642 INFO [main] impl.TimelineClientImpl (TimelineClientImpl.java:serviceInit(296)) - Timeline service address: http://hw11074.local:64448/ws/v1/timeline/ 2016-03-08 14:51:39,664 INFO [main] impl.TimelineClientImpl (TimelineClientImpl.java:logException(212)) - Exception caught by TimelineClientConnectionRetry, will try 30 more t ime(s). Message: java.net.ConnectException: Connection refused ... {code} I checked the test log that the timeline server is launched on the correct port. Several other folks can reproduce this problem as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4748) ApplicationHistoryManagerOnTimelineStore should not swallow exceptions on generateApplicationReport
Li Lu created YARN-4748: --- Summary: ApplicationHistoryManagerOnTimelineStore should not swallow exceptions on generateApplicationReport Key: YARN-4748 URL: https://issues.apache.org/jira/browse/YARN-4748 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Li Lu Assignee: Li Lu We're directly swallowing AuthorizationExceptions and ApplicationAttemptNotFoundExceptions when generating application reports. we should at least mark down the exception before proceed with default values (which will assign app attempt id to -1). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4700) ATS storage has one extra record each time the RM got restarted
Li Lu created YARN-4700: --- Summary: ATS storage has one extra record each time the RM got restarted Key: YARN-4700 URL: https://issues.apache.org/jira/browse/YARN-4700 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Li Lu When testing the new web UI for ATS v2, I noticed that we're creating one extra record for each finished application (but still hold in the RM state store) each time the RM got restarted. It's quite possible that we add the cluster start timestamp into the default cluster id, thus each time we're creating a new record for one application (cluster id is a part of the row key). We need to fix this behavior, probably by having a better default cluster id. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4688) Allow YARN system metrics publisher to use ATS v1.5 APIs
Li Lu created YARN-4688: --- Summary: Allow YARN system metrics publisher to use ATS v1.5 APIs Key: YARN-4688 URL: https://issues.apache.org/jira/browse/YARN-4688 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Assignee: Li Lu We may want to consider to use ATS v1.5 APIs for system metrics publisher. There are some contributions from the ATS v2 branch that refactors the YARN SMP to allow it work with multiple versions. We may also need to consider merge in this change. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-4604) TimelineDataManager should return gracefully when one entity's id or type is empty
[ https://issues.apache.org/jira/browse/YARN-4604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu resolved YARN-4604. - Resolution: Won't Fix > TimelineDataManager should return gracefully when one entity's id or type is > empty > -- > > Key: YARN-4604 > URL: https://issues.apache.org/jira/browse/YARN-4604 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Li Lu >Assignee: Li Lu > > As discussed in YARN-4596, when the timeline data manager hit one entity > whose id and/or type fields are empty, it should not directly throw > exception. It should at least let the client side know which entities have > been posted to the timeline server, and which ones haven't. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4604) TimelineDataManager should return gracefully when one entity's id or type is empty
Li Lu created YARN-4604: --- Summary: TimelineDataManager should return gracefully when one entity's id or type is empty Key: YARN-4604 URL: https://issues.apache.org/jira/browse/YARN-4604 Project: Hadoop YARN Issue Type: Bug Reporter: Li Lu Assignee: Li Lu As discussed in YARN-4596, when the timeline data manager hit one entity whose id and/or type fields are empty, it should not directly throw exception. It should at least let the client side know which entities have been posted to the timeline server, and which ones haven't. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4596) SystemMetricPublisher should not swallow error messages from TimelineClient#putEntities
Li Lu created YARN-4596: --- Summary: SystemMetricPublisher should not swallow error messages from TimelineClient#putEntities Key: YARN-4596 URL: https://issues.apache.org/jira/browse/YARN-4596 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Li Lu Assignee: Li Lu We should report error messages from the returned TimelineResponse when posting timeline entities through system metric publisher. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4545) Allow YARN distributed shell to use ATS v1.5 APIs
Li Lu created YARN-4545: --- Summary: Allow YARN distributed shell to use ATS v1.5 APIs Key: YARN-4545 URL: https://issues.apache.org/jira/browse/YARN-4545 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Assignee: Li Lu We can use YARN distributed shell as a demo for the ATS v1.5 APIs. We need to allow distributed shell post data with ATS v1.5 API if 1.5 is enabled in the system. We also need to provide a sample plugin to read those data out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4460) [Bug fix] RM fails to start when SMP is enabled
Li Lu created YARN-4460: --- Summary: [Bug fix] RM fails to start when SMP is enabled Key: YARN-4460 URL: https://issues.apache.org/jira/browse/YARN-4460 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: YARN-2928 Reporter: Li Lu Assignee: Li Lu When SMP is enabled, the RM starts with the following fatal message: {code} FATAL event.AsyncDispatcher (AsyncDispatcher.java:dispatch(189)) - Error in dispatcher thread true java.lang.Exception: No handler for registered for class org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractSystemMetricsPubli sher$SystemMetricsEventType at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:185) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) at java.lang.Thread.run(Thread.java:745) {code} We should register event handlers in service init stage in TimelineServiceV2Publisher to fix this problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4445) Unify the term flowId and flowName in timeline v2 codebase
Li Lu created YARN-4445: --- Summary: Unify the term flowId and flowName in timeline v2 codebase Key: YARN-4445 URL: https://issues.apache.org/jira/browse/YARN-4445 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Assignee: Li Lu Flow names are not sufficient to identify a flow. I noticed we used both "flowName" and "flowId" to point to the same thing. We need to unify them to flowName. Otherwise, front end users may think flow id is a top level concept and try to directly locate a flow by its flow id. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4444) TimelineClientImpl#doPosting should not fail on AbstractMethodError
Li Lu created YARN-: --- Summary: TimelineClientImpl#doPosting should not fail on AbstractMethodError Key: YARN- URL: https://issues.apache.org/jira/browse/YARN- Project: Hadoop YARN Issue Type: Bug Reporter: Li Lu Assignee: Li Lu We noticed there were application failures due to incompatible jackson changes. While the compatibility story is on the ATS user's side, the timeline client should not fail the whole process on AbstractMethodError. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4265) Provide new timeline plugin storage to support fine-grained entity caching
Li Lu created YARN-4265: --- Summary: Provide new timeline plugin storage to support fine-grained entity caching Key: YARN-4265 URL: https://issues.apache.org/jira/browse/YARN-4265 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Assignee: Li Lu To support the newly proposed APIs in YARN-4234, we need to create a new plugin timeline store. The store may have similar behavior as the EntityFileTimelineStore proposed in YARN-3942, but cache date in cache id granularity, instead of application id granularity. Let's have this storage as a standalone one, instead of updating EntityFileTimelineStore, to keep the existing store (EntityFileTimelineStore) stable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4220) [Storage implementation] Support getEntities with only Application id but no flow and flow run ID
Li Lu created YARN-4220: --- Summary: [Storage implementation] Support getEntities with only Application id but no flow and flow run ID Key: YARN-4220 URL: https://issues.apache.org/jira/browse/YARN-4220 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Assignee: Li Lu Currently we're enforcing flow and flowrun id to be non-null values on {{getEntities}}. We can actually query the appToFlow table to figure out an application's flow id and flowrun id if they're missing. This will simplify normal queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4219) New levelDB cache storage for timeline v1.5
Li Lu created YARN-4219: --- Summary: New levelDB cache storage for timeline v1.5 Key: YARN-4219 URL: https://issues.apache.org/jira/browse/YARN-4219 Project: Hadoop YARN Issue Type: Bug Reporter: Li Lu Assignee: Li Lu We need to have an "offline" caching storage for timeline server v1.5 after the changes in YARN-3942. The in memory timeline storage may run into OOM issues when used as a cache storage for entity file timeline storage. We can refactor the code and have a level db based caching storage for this use case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4102) Add an "incremental" mode for timeline schema creator
Li Lu created YARN-4102: --- Summary: Add an "incremental" mode for timeline schema creator Key: YARN-4102 URL: https://issues.apache.org/jira/browse/YARN-4102 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Assignee: Li Lu When debugging timeline POCs, we may need to create hbase tables that are added in some ongoing patches. Right now, our schema creator will exit when it hits one existing table. While this is a correct behavior with end users, this introduces much trouble in debugging POCs: every time we have to disable all existing tables, drop them, run the schema creator to generate all tables, and regenerate all test data. Maybe we'd like to add an "incremental" mode so that the creator will only create non-existing tables? This is pretty handy in deploying our POCs. Of course, consistency has to be kept in mind across tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4097) Create POC timeline web UI with new YARN web UI framework
Li Lu created YARN-4097: --- Summary: Create POC timeline web UI with new YARN web UI framework Key: YARN-4097 URL: https://issues.apache.org/jira/browse/YARN-4097 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Assignee: Li Lu As planned, we need to try out the new YARN web UI framework and implement timeline v2 web UI on top of it. This JIRA proposes to build the basic active flow and application lists of the timeline data. We can add more content after we get used to this framework. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4061) [Fault tolerance] Fault tolerant writer for timeline v2
Li Lu created YARN-4061: --- Summary: [Fault tolerance] Fault tolerant writer for timeline v2 Key: YARN-4061 URL: https://issues.apache.org/jira/browse/YARN-4061 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Assignee: Li Lu We need to build a timeline writer that can be resistant to backend storage down time and timeline collector failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3595) Performance optimization using connection cache of Phoenix timeline writer
[ https://issues.apache.org/jira/browse/YARN-3595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu resolved YARN-3595. - Resolution: Later Since we're moving phoenix storage to aggregation side, we need to fix this after we have a full story of the offline (time-based) aggregation. > Performance optimization using connection cache of Phoenix timeline writer > -- > > Key: YARN-3595 > URL: https://issues.apache.org/jira/browse/YARN-3595 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > > The story about the connection cache in Phoenix timeline storage is a little > bit long. In YARN-3033 we planned to have shared writer layer for all > collectors in the same collector manager. In this way we can better reuse the > same heavy-weight storage layer connection, therefore it's more friendly to > conventional storage layer connections which are typically heavy-weight. > Phoenix, on the other hand, implements its own connection interface layer to > be light-weight, thread-unsafe. To make these connections work with our > "multiple collector, single writer" model, we're adding a thread indexed > connection cache. However, many performance critical factors are yet to be > tested. > In this JIRA we're tracing performance optimization efforts using this > connection cache. Previously we had a draft, but there was one implementation > challenge on cache evictions: There may be races between Guava cache's > removal listener calls (which close the connection) and normal references to > the connection. We need to carefully define the way they synchronize. > Performance-wise, at the very beginning stage we may need to understand: > # If the current, thread-based indexing is an appropriate approach, or we can > use some better ways to index the connections. > # the best size of the cache, presumably as the proposed default value of a > configuration. > # how long we need to preserve a connection in the cache. > Please feel free to add this list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3904) Adopt PhoenixTimelineWriter into time-based aggregation storage
Li Lu created YARN-3904: --- Summary: Adopt PhoenixTimelineWriter into time-based aggregation storage Key: YARN-3904 URL: https://issues.apache.org/jira/browse/YARN-3904 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Assignee: Li Lu After we finished the design for time-based aggregation, we can adopt our existing Phoenix storage into the storage of the aggregated data. This JIRA proposes to move the Phoenix storage implementation from o.a.h.yarn.server.timelineservice.storage to o.a.h.yarn.server.timelineservice.aggregation.timebased, and make it a fully devoted writer for time-based aggregation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3899) Add equals and hashCode to TimelineEntity
[ https://issues.apache.org/jira/browse/YARN-3899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu resolved YARN-3899. - Resolution: Duplicate Duplicate to YARN-3836. > Add equals and hashCode to TimelineEntity > - > > Key: YARN-3899 > URL: https://issues.apache.org/jira/browse/YARN-3899 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Li Lu >Assignee: Li Lu > > We need to add equals and hashCode methods for timeline entity so that we can > easily tell if two timeline entities are equal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3899) Add equals and hashCode to TimelineEntity
Li Lu created YARN-3899: --- Summary: Add equals and hashCode to TimelineEntity Key: YARN-3899 URL: https://issues.apache.org/jira/browse/YARN-3899 Project: Hadoop YARN Issue Type: Improvement Reporter: Li Lu Assignee: Li Lu We need to add equals and hashCode methods for timeline entity so that we can easily tell if two timeline entities are equal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3765) Fix findbugs the warning in YARN-2928 branch, TimelineMetric
Li Lu created YARN-3765: --- Summary: Fix findbugs the warning in YARN-2928 branch, TimelineMetric Key: YARN-3765 URL: https://issues.apache.org/jira/browse/YARN-3765 Project: Hadoop YARN Issue Type: Bug Reporter: Li Lu Assignee: Li Lu There is one warning about reversing the return value of comparisons in YARN-2928 branch. I believe this is a false alarm since we intentionally said the comparator is a "reversed" comparator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3702) Timeline service v2 load generator needs to write event id
Li Lu created YARN-3702: --- Summary: Timeline service v2 load generator needs to write event id Key: YARN-3702 URL: https://issues.apache.org/jira/browse/YARN-3702 Project: Hadoop YARN Issue Type: Bug Reporter: Li Lu Assignee: Li Lu We need to write a sample event id in SimpleEntityWriter so that both HBase and Phoenix writers can actually write the timeline event. For now the Phoenix implementation will throw exceptions and the HBase will skip storing the timeline event. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3595) Performance optimization for the connection cache of Phoenix timeline writer
Li Lu created YARN-3595: --- Summary: Performance optimization for the connection cache of Phoenix timeline writer Key: YARN-3595 URL: https://issues.apache.org/jira/browse/YARN-3595 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Assignee: Li Lu The story about the connection cache in Phoenix timeline storage is a little bit long. In YARN-3033 we planned to have shared writer layer for all collectors in the same collector manager. In this way we can better reuse the same heavy-weight storage layer connection, therefore it's more friendly to conventional storage layer connections which are typically heavy-weight. Phoenix, on the other hand, implements its own connection interface layer to be light-weight, thread-unsafe. To make these connections work with our "multiple collector, single writer" model, we're adding a thread indexed connection cache. However, many performance critical factors are yet to be tested. In this JIRA we're tracing performance optimization efforts for this connection cache. At the very beginning stage we may need to understand: # If the current, thread-based indexing is an appropriate approach, or we can use some better ways to index the connections. # the best size of the cache, presumably as the proposed default value of a configuration. # how long we need to preserve a connection in the cache. Please feel free to add this list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3407) HttpServer2 Max threads in TimelineCollectorManager should be more than 10
[ https://issues.apache.org/jira/browse/YARN-3407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu resolved YARN-3407. - Resolution: Later I'm closing this as "Later" in case we need to do performance tuning after we reached that phase. > HttpServer2 Max threads in TimelineCollectorManager should be more than 10 > -- > > Key: YARN-3407 > URL: https://issues.apache.org/jira/browse/YARN-3407 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > > Currently TimelineCollectorManager sets HttpServer2.HTTP_MAX_THREADS to just > 10. This value might be too less for serving put requests. By default > HttpServer2 will have max threads value of 250. We can probably make it > configurable too so that an optimum value can be configured based on number > of requests coming to server. Thoughts ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3529) Add miniHBase cluster and Phoenix support to ATS v2 unit tests
Li Lu created YARN-3529: --- Summary: Add miniHBase cluster and Phoenix support to ATS v2 unit tests Key: YARN-3529 URL: https://issues.apache.org/jira/browse/YARN-3529 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Assignee: Li Lu After we have our HBase and Phoenix writer implementations, we may want to find a way to set up HBase and Phoenix in our unit tests. We need to do this integration before the branch got merged back to trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3459) TestLog4jWarningErrorMetricsAppender breaks in trunk
Li Lu created YARN-3459: --- Summary: TestLog4jWarningErrorMetricsAppender breaks in trunk Key: YARN-3459 URL: https://issues.apache.org/jira/browse/YARN-3459 Project: Hadoop YARN Issue Type: Bug Reporter: Li Lu Assignee: Li Lu Priority: Blocker Fix For: 2.7.0 TestLog4jWarningErrorMetricsAppender fails with the following message: {code} Running org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.214 sec <<< FAILURE! - in org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender testPurge(org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender) Time elapsed: 2.01 sec <<< FAILURE! java.lang.AssertionError: expected:<0> but was:<1> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender.testPurge(TestLog4jWarningErrorMetricsAppender.java:89) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3426) Add jdiff support to YARN
Li Lu created YARN-3426: --- Summary: Add jdiff support to YARN Key: YARN-3426 URL: https://issues.apache.org/jira/browse/YARN-3426 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Assignee: Li Lu Maybe we'd like to extend our current jdiff tool for hadoop-common and hdfs to YARN as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3380) Add protobuf compatibility checker to jenkins test runs
Li Lu created YARN-3380: --- Summary: Add protobuf compatibility checker to jenkins test runs Key: YARN-3380 URL: https://issues.apache.org/jira/browse/YARN-3380 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Assignee: Li Lu We may want to run the protobuf compatibility checker for each incoming patch, to prevent incompatible changes for rolling upgrades. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3352) Change distributed shell to use TIMELINE_SERVICE_VERSION
Li Lu created YARN-3352: --- Summary: Change distributed shell to use TIMELINE_SERVICE_VERSION Key: YARN-3352 URL: https://issues.apache.org/jira/browse/YARN-3352 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Assignee: Li Lu After YARN-3034, we have a new global configuration for active timeline service version. We may want to use that new setting in distributed shell, instead of a customized command-line setting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3330) Implement a protobuf compatibility checker to check if a patch breaks the compatibility with existing client and internal protocols
Li Lu created YARN-3330: --- Summary: Implement a protobuf compatibility checker to check if a patch breaks the compatibility with existing client and internal protocols Key: YARN-3330 URL: https://issues.apache.org/jira/browse/YARN-3330 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Assignee: Li Lu Per YARN-3292, we may want to start YARN rolling upgrade test compatibility verification tool by a simple script to check protobuf compatibility. The script may work on incoming patch files, check if there are any changes to protobuf files, and report any potentially incompatible changes (line removals, etc,.). We may want the tool to be conservative: it may report false positives, but we should minimize its chance to have false negatives. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3292) [Umbrella] Tests and/or tools for YARN backwards compatibility verification
Li Lu created YARN-3292: --- Summary: [Umbrella] Tests and/or tools for YARN backwards compatibility verification Key: YARN-3292 URL: https://issues.apache.org/jira/browse/YARN-3292 Project: Hadoop YARN Issue Type: Improvement Reporter: Li Lu Assignee: Li Lu YARN-666 added the support to YARN rolling upgrade. In order to support this feature, we made changes from many perspectives. There were many assumptions made together with these existing changes. Future code changes may break these assumptions by accident, and hence break the YARN rolling upgrades feature. To simplify YARN RU regression tests, maybe we would like to create a set of tools/tests that can verify YARN RU backward compatibility. On the very first step, we may want to have a compatibility checker for important protocols and APIs. We may also want to incorporate these tools into our test Jenkins runs, if necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3210) Refactor timeline aggregator according to new code organization
Li Lu created YARN-3210: --- Summary: Refactor timeline aggregator according to new code organization Key: YARN-3210 URL: https://issues.apache.org/jira/browse/YARN-3210 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Assignee: Li Lu We may want to refactor the code of timeline aggregator according to the discussion of YARN-3166, the code organization for timeline service v2. We need to refactor the code after we reach an agreement on the aggregator part of YARN-3166. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3166) Decide detailed package structures for timeline service v2 components
Li Lu created YARN-3166: --- Summary: Decide detailed package structures for timeline service v2 components Key: YARN-3166 URL: https://issues.apache.org/jira/browse/YARN-3166 Project: Hadoop YARN Issue Type: Task Reporter: Li Lu Open this JIRA to track all discussions on detailed package structures for timeline services v2. This JIRA is for discussion only so I don't think it should have any assignees. For our current timeline service v2 design, aggregator (previously called "writer") implementation is in hadoop-yarn-server's: {{org.apache.hadoop.yarn.server.timelineservice.aggregator}} In YARN-2928's design, the next gen ATS reader is also a server. Maybe we want to put reader related implementations into hadoop-yarn-server's: {{org.apache.hadoop.yarn.server.timelineservice.reader}} Both readers and aggregators will expose features that may be used by YARN and other 3rd party components, such as aggregator/reader APIs. For those features, maybe we would like to expose their interfaces to hadoop-yarn-common's {{org.apache.hadoop.yarn.timelineservice}}? Let's use this JIRA as a centralized place to track all related discussions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3155) Refactor the exception handling code for TimelineClientImpl's retryOn method
Li Lu created YARN-3155: --- Summary: Refactor the exception handling code for TimelineClientImpl's retryOn method Key: YARN-3155 URL: https://issues.apache.org/jira/browse/YARN-3155 Project: Hadoop YARN Issue Type: Bug Reporter: Li Lu Assignee: Li Lu Priority: Minor Since we switched to Java 1.7, the exception handling code for the retryOn method can be merged into one statement block, instead of the current two, to avoid repeated code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2709) Add retry for timeline client getDelegationToken method
Li Lu created YARN-2709: --- Summary: Add retry for timeline client getDelegationToken method Key: YARN-2709 URL: https://issues.apache.org/jira/browse/YARN-2709 Project: Hadoop YARN Issue Type: Bug Reporter: Li Lu Assignee: Li Lu As mentioned in YARN-2673, we need to add retry mechanism to timeline client for secured clusters. This means if the timeline server is not available, a timeline client needs to retry to get a delegation token. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2673) Add retry for timeline client
Li Lu created YARN-2673: --- Summary: Add retry for timeline client Key: YARN-2673 URL: https://issues.apache.org/jira/browse/YARN-2673 Project: Hadoop YARN Issue Type: Bug Reporter: Li Lu Assignee: Li Lu Timeline client now does not handle the case gracefully when the server is down. Jobs from distributed shell may fail due to ATS restart. We may need to add some retry mechanisms to the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2343) Improve
Li Lu created YARN-2343: --- Summary: Improve Key: YARN-2343 URL: https://issues.apache.org/jira/browse/YARN-2343 Project: Hadoop YARN Issue Type: Improvement Reporter: Li Lu Priority: Trivial -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2317) Update documentation about how to write YARN applications
Li Lu created YARN-2317: --- Summary: Update documentation about how to write YARN applications Key: YARN-2317 URL: https://issues.apache.org/jira/browse/YARN-2317 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Assignee: Li Lu Some information in WritingYarnApplications webpage is out-dated. Need some refresh work on this document to reflect the most recent changes in YARN APIs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (YARN-2296) Update Application Master of YARN distributed shell with existing public stable API
[ https://issues.apache.org/jira/browse/YARN-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu resolved YARN-2296. - Resolution: Duplicate Merged into YARN-2295 > Update Application Master of YARN distributed shell with existing public > stable API > --- > > Key: YARN-2296 > URL: https://issues.apache.org/jira/browse/YARN-2296 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Li Lu > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2296) Update Application Master of YARN distributed shell with existing public stable API
Li Lu created YARN-2296: --- Summary: Update Application Master of YARN distributed shell with existing public stable API Key: YARN-2296 URL: https://issues.apache.org/jira/browse/YARN-2296 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2295) Updating Client of YARN distributed shell with existing public stable API
Li Lu created YARN-2295: --- Summary: Updating Client of YARN distributed shell with existing public stable API Key: YARN-2295 URL: https://issues.apache.org/jira/browse/YARN-2295 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Assignee: Li Lu Some API calls in YARN distributed shell client have been marked as unstable and private. Use existing public stable API to replace them, if possible. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2294) Update sample program and documentations for writing YARN Application
Li Lu created YARN-2294: --- Summary: Update sample program and documentations for writing YARN Application Key: YARN-2294 URL: https://issues.apache.org/jira/browse/YARN-2294 Project: Hadoop YARN Issue Type: Improvement Reporter: Li Lu Many APIs for writing YARN applications have been stabilized. However, some of them have also been changed since the last time sample YARN program, like distributed shell, and documentations were updated. There are on-going discussions in the user's mailing list about updating the outdated "Writing YARN Applications" documentation. Updating the sample programs like distributed shells is also needed, since they may probably be the very first demonstration of YARN applications for newcomers. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (YARN-2242) Improve exception information on AM launch crashes
[ https://issues.apache.org/jira/browse/YARN-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu resolved YARN-2242. - Resolution: Duplicate Close as duplicate, with YARN 2013. > Improve exception information on AM launch crashes > -- > > Key: YARN-2242 > URL: https://issues.apache.org/jira/browse/YARN-2242 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Li Lu >Assignee: Li Lu > Fix For: 2.6.0 > > Attachments: YARN-2242-070114-1.patch, YARN-2242-070114.patch, > YARN-2242-070115-1.patch, YARN-2242-070115-2.patch, YARN-2242-070115.patch > > > Now on each time AM Container crashes during launch, both the console and the > webpage UI only report a ShellExitCodeExecption. This is not only unhelpful, > but sometimes confusing. With the help of log aggregator, container logs are > actually aggregated, and can be very helpful for debugging. One possible way > to improve the whole process is to send a "pointer" to the aggregated logs to > the programmer when reporting exception information. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2242) Improve exception information on AM launch crashes
Li Lu created YARN-2242: --- Summary: Improve exception information on AM launch crashes Key: YARN-2242 URL: https://issues.apache.org/jira/browse/YARN-2242 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu -- This message was sent by Atlassian JIRA (v6.2#6252)