[jira] [Created] (MESOS-7626) Create a CI job to publish the website
Vinod Kone created MESOS-7626: - Summary: Create a CI job to publish the website Key: MESOS-7626 URL: https://issues.apache.org/jira/browse/MESOS-7626 Project: Mesos Issue Type: Task Reporter: Vinod Kone Assignee: Vinod Kone This job periodically scans for changes to the master branch of `mesos` and publishes an updated website to `mesos-site`. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7624) Move website from svn to git
[ https://issues.apache.org/jira/browse/MESOS-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-7624: -- Shepherd: Vinod Kone Story Points: 3 Sprint: Mesosphere Sprint 57 File a repo request to create "mesos-site" git repo. Once created, we need to move the contents over from svn repo to git repo. > Move website from svn to git > > > Key: MESOS-7624 > URL: https://issues.apache.org/jira/browse/MESOS-7624 > Project: Mesos > Issue Type: Task >Reporter: Vinod Kone >Assignee: Vinod Kone > > Move our website svn repo at https://svn.apache.org/repos/asf/mesos/site to a > git repo. > Having git repo for both the main project and website allows us to deal with > one version control system. Also git based projects are easy to automate via > CI (e.g., git commit) because ASF CI already has required credentials. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7623) Automatically publish website through CI
[ https://issues.apache.org/jira/browse/MESOS-7623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-7623: -- Sprint: (was: Mesosphere Sprint 57) > Automatically publish website through CI > > > Key: MESOS-7623 > URL: https://issues.apache.org/jira/browse/MESOS-7623 > Project: Mesos > Issue Type: Epic >Reporter: Vinod Kone >Assignee: Vinod Kone > > Currently, publishing the website is a manual process whereby a committer > runs a local docker script, copies the generated `publish` folder to svn copy > and does an `svn commit`. This is both cumbersome and error prone. > We should automate this process by running this as a CI job. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7623) Automatically publish website through CI
[ https://issues.apache.org/jira/browse/MESOS-7623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-7623: -- Story Points: 8 (was: 1) > Automatically publish website through CI > > > Key: MESOS-7623 > URL: https://issues.apache.org/jira/browse/MESOS-7623 > Project: Mesos > Issue Type: Epic >Reporter: Vinod Kone >Assignee: Vinod Kone > > Currently, publishing the website is a manual process whereby a committer > runs a local docker script, copies the generated `publish` folder to svn copy > and does an `svn commit`. This is both cumbersome and error prone. > We should automate this process by running this as a CI job. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7625) Create script to automate publishing website
Vinod Kone created MESOS-7625: - Summary: Create script to automate publishing website Key: MESOS-7625 URL: https://issues.apache.org/jira/browse/MESOS-7625 Project: Mesos Issue Type: Task Reporter: Vinod Kone Assignee: Vinod Kone These script will be run via ASF CI and be responsible for 1) checking out the latest master branch 2) build mesos and generate endpoints help 3) generate website contents 4) publish website by doing a git commit to `mesos-site` repo -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Issue Comment Deleted] (MESOS-7623) Automatically publish website through CI
[ https://issues.apache.org/jira/browse/MESOS-7623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-7623: -- Comment: was deleted (was: File a repo request to create "mesos-site" git repo.) > Automatically publish website through CI > > > Key: MESOS-7623 > URL: https://issues.apache.org/jira/browse/MESOS-7623 > Project: Mesos > Issue Type: Epic >Reporter: Vinod Kone >Assignee: Vinod Kone > > Currently, publishing the website is a manual process whereby a committer > runs a local docker script, copies the generated `publish` folder to svn copy > and does an `svn commit`. This is both cumbersome and error prone. > We should automate this process by running this as a CI job. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7623) Automatically publish website through CI
[ https://issues.apache.org/jira/browse/MESOS-7623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-7623: -- Shepherd: Vinod Kone Story Points: 1 Sprint: Mesosphere Sprint 57 > Automatically publish website through CI > > > Key: MESOS-7623 > URL: https://issues.apache.org/jira/browse/MESOS-7623 > Project: Mesos > Issue Type: Epic >Reporter: Vinod Kone >Assignee: Vinod Kone > > Currently, publishing the website is a manual process whereby a committer > runs a local docker script, copies the generated `publish` folder to svn copy > and does an `svn commit`. This is both cumbersome and error prone. > We should automate this process by running this as a CI job. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-7623) Automatically publish website through CI
[ https://issues.apache.org/jira/browse/MESOS-7623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16037816#comment-16037816 ] Vinod Kone commented on MESOS-7623: --- File a repo request to create "mesos-site" git repo. > Automatically publish website through CI > > > Key: MESOS-7623 > URL: https://issues.apache.org/jira/browse/MESOS-7623 > Project: Mesos > Issue Type: Epic >Reporter: Vinod Kone >Assignee: Vinod Kone > > Currently, publishing the website is a manual process whereby a committer > runs a local docker script, copies the generated `publish` folder to svn copy > and does an `svn commit`. This is both cumbersome and error prone. > We should automate this process by running this as a CI job. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7624) Move website from svn to git
Vinod Kone created MESOS-7624: - Summary: Move website from svn to git Key: MESOS-7624 URL: https://issues.apache.org/jira/browse/MESOS-7624 Project: Mesos Issue Type: Task Reporter: Vinod Kone Assignee: Vinod Kone Move our website svn repo at https://svn.apache.org/repos/asf/mesos/site to a git repo. Having git repo for both the main project and website allows us to deal with one version control system. Also git based projects are easy to automate via CI (e.g., git commit) because ASF CI already has required credentials. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7623) Automatically publish website through CI
Vinod Kone created MESOS-7623: - Summary: Automatically publish website through CI Key: MESOS-7623 URL: https://issues.apache.org/jira/browse/MESOS-7623 Project: Mesos Issue Type: Epic Reporter: Vinod Kone Assignee: Vinod Kone Currently, publishing the website is a manual process whereby a committer runs a local docker script, copies the generated `publish` folder to svn copy and does an `svn commit`. This is both cumbersome and error prone. We should automate this process by running this as a CI job. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (MESOS-1309) Automate updating the website for a new release
[ https://issues.apache.org/jira/browse/MESOS-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone reassigned MESOS-1309: - Assignee: (was: Vinod Kone) > Automate updating the website for a new release > --- > > Key: MESOS-1309 > URL: https://issues.apache.org/jira/browse/MESOS-1309 > Project: Mesos > Issue Type: Improvement > Components: project website >Reporter: Vinod Kone >Priority: Minor > > This could be script that lives in our website repo that > 1) updates the links in the website to the latest release which appear on the > homepage and downloads page > 2) deletes old release from dist.a.o per MESOS-850 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (MESOS-1309) Automate updating the website for a new release
[ https://issues.apache.org/jira/browse/MESOS-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone reassigned MESOS-1309: - Assignee: Vinod Kone > Automate updating the website for a new release > --- > > Key: MESOS-1309 > URL: https://issues.apache.org/jira/browse/MESOS-1309 > Project: Mesos > Issue Type: Improvement > Components: project website >Reporter: Vinod Kone >Assignee: Vinod Kone >Priority: Minor > > This could be script that lives in our website repo that > 1) updates the links in the website to the latest release which appear on the > homepage and downloads page > 2) deletes old release from dist.a.o per MESOS-850 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-7621) Fetcher does not handle content length and redirects
[ https://issues.apache.org/jira/browse/MESOS-7621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16037677#comment-16037677 ] Charles Allen commented on MESOS-7621: -- In a basic digging, it looks like https://github.com/apache/mesos/blob/1.2.0/3rdparty/stout/include/stout/net.hpp#L101 does another request to the same location to get the content length. I do see the following in the logs, even though the files download successfully: {code} [1B blob data] HTTP/1.1 403 Forbidden x-amz-request-id: REQUEST_ID_REDACTED x-amz-id-2: ID_REDACTED= Content-Type: application/xml Transfer-Encoding: chunked Date: Mon, 05 Jun 2017 18:25:45 GMT Server: AmazonS3 {code} > Fetcher does not handle content length and redirects > > > Key: MESOS-7621 > URL: https://issues.apache.org/jira/browse/MESOS-7621 > Project: Mesos > Issue Type: Bug > Components: fetcher >Affects Versions: 1.2.0 >Reporter: Charles Allen > > {code} > $ curl -L -v -O -s http://HOSTNAME_REDACTED/PATH_REDACTED.tar.gz > * Trying 172.17.4.10... > * Connected to HOSTNAME_REDACTED (172.17.4.10) port 80 (#0) > > GET /PATH_REDACTED.tar.gz HTTP/1.1 > > Host: HOSTNAME_REDACTED > > User-Agent: curl/7.43.0 > > Accept: */* > > > < HTTP/1.1 302 FOUND > < Server: nginx/1.4.6 (Ubuntu) > < Date: Mon, 05 Jun 2017 17:58:04 GMT > < Content-Type: text/html; charset=utf-8 > < Content-Length: 1947 > < Connection: keep-alive > < Location: > https://BUCKET_REDACTED.s3.amazonaws.com:443/PATH_REDACTED?Signature=REDACTED%3D&Expires=1496689084&AWSAccessKeyId=KEY_REDACTED&x-amz-security-token=TOKEN_REDACTED%3D > < > * Ignoring the response-body > { [309 bytes data] > * Connection #0 to host HOSTNAME_REDACTED left intact > * Issue another request to this URL: > 'https://BUCKET_REDACTED.s3.amazonaws.com:443/PATH_REDACTED.tar.gz?Signature=SIGNATURE_REDACTED%3D&Expires=1496689084&AWSAccessKeyId=KEY_REDACTED&x-amz-security-token=TOKEN_REDACTED%3D' > * Trying 54.231.40.75... > * Connected to BUCKET_REDACTED.s3.amazonaws.com (54.231.40.75) port 443 (#1) > * TLS 1.2 connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 > * Server certificate: *.s3.amazonaws.com > * Server certificate: DigiCert Baltimore CA-2 G2 > * Server certificate: Baltimore CyberTrust Root > > GET > > /PATH_REDACTED.tar.gz?Signature=REDACTED&Expires=1496689084&AWSAccessKeyId=KEY_REDACTED&x-amz-security-token=TOKEN_REDACTED%3D > > HTTP/1.1 > > Host: BUCKET_REDACTED.s3.amazonaws.com > > User-Agent: curl/7.43.0 > > Accept: */* > > > < HTTP/1.1 200 OK > < x-amz-id-2: ID_REDACTED= > < x-amz-request-id: REQUEST_ID_REDACTED > < Date: Mon, 05 Jun 2017 17:58:07 GMT > < Last-Modified: Thu, 01 Jun 2017 03:04:49 GMT > < ETag: "ETAG_REDACTED" > < Accept-Ranges: bytes > < Content-Type: application/x-tar > < Content-Length: 208245664 > < Server: AmazonS3 > < > { [16360 bytes data] > {code} > We have a micro-service which signs temporary urls for services which can't > speak natively with S3. The above is an example download using {{curl}}. But > when using the mesos fetcher the agent logs contain the following information: > {code} > fetcher.cpp:479] Reverting to fetching directly into the sandbox for > 'http://HOST_REDACTED/PATH_REDACTED.tar.gz', due to failure to fetch through > the cache, with error: Could not determine size of cache file for > 'USER_REDACTED@http://HOST_REDACTED/PATH_REDACTED.tar.gz' with error: No URL > content-length available > {code} > Any idea why this error would occur? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7621) Fetcher does not handle content length and redirects
[ https://issues.apache.org/jira/browse/MESOS-7621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Allen updated MESOS-7621: - Summary: Fetcher does not handle content length and redirects (was: Fetcher does not handle content length in redirects) > Fetcher does not handle content length and redirects > > > Key: MESOS-7621 > URL: https://issues.apache.org/jira/browse/MESOS-7621 > Project: Mesos > Issue Type: Bug > Components: fetcher >Affects Versions: 1.2.0 >Reporter: Charles Allen > > {code} > $ curl -L -v -O -s http://HOSTNAME_REDACTED/PATH_REDACTED.tar.gz > * Trying 172.17.4.10... > * Connected to HOSTNAME_REDACTED (172.17.4.10) port 80 (#0) > > GET /PATH_REDACTED.tar.gz HTTP/1.1 > > Host: HOSTNAME_REDACTED > > User-Agent: curl/7.43.0 > > Accept: */* > > > < HTTP/1.1 302 FOUND > < Server: nginx/1.4.6 (Ubuntu) > < Date: Mon, 05 Jun 2017 17:58:04 GMT > < Content-Type: text/html; charset=utf-8 > < Content-Length: 1947 > < Connection: keep-alive > < Location: > https://BUCKET_REDACTED.s3.amazonaws.com:443/PATH_REDACTED?Signature=REDACTED%3D&Expires=1496689084&AWSAccessKeyId=KEY_REDACTED&x-amz-security-token=TOKEN_REDACTED%3D > < > * Ignoring the response-body > { [309 bytes data] > * Connection #0 to host HOSTNAME_REDACTED left intact > * Issue another request to this URL: > 'https://BUCKET_REDACTED.s3.amazonaws.com:443/PATH_REDACTED.tar.gz?Signature=SIGNATURE_REDACTED%3D&Expires=1496689084&AWSAccessKeyId=KEY_REDACTED&x-amz-security-token=TOKEN_REDACTED%3D' > * Trying 54.231.40.75... > * Connected to BUCKET_REDACTED.s3.amazonaws.com (54.231.40.75) port 443 (#1) > * TLS 1.2 connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 > * Server certificate: *.s3.amazonaws.com > * Server certificate: DigiCert Baltimore CA-2 G2 > * Server certificate: Baltimore CyberTrust Root > > GET > > /PATH_REDACTED.tar.gz?Signature=REDACTED&Expires=1496689084&AWSAccessKeyId=KEY_REDACTED&x-amz-security-token=TOKEN_REDACTED%3D > > HTTP/1.1 > > Host: BUCKET_REDACTED.s3.amazonaws.com > > User-Agent: curl/7.43.0 > > Accept: */* > > > < HTTP/1.1 200 OK > < x-amz-id-2: ID_REDACTED= > < x-amz-request-id: REQUEST_ID_REDACTED > < Date: Mon, 05 Jun 2017 17:58:07 GMT > < Last-Modified: Thu, 01 Jun 2017 03:04:49 GMT > < ETag: "ETAG_REDACTED" > < Accept-Ranges: bytes > < Content-Type: application/x-tar > < Content-Length: 208245664 > < Server: AmazonS3 > < > { [16360 bytes data] > {code} > We have a micro-service which signs temporary urls for services which can't > speak natively with S3. The above is an example download using {{curl}}. But > when using the mesos fetcher the agent logs contain the following information: > {code} > fetcher.cpp:479] Reverting to fetching directly into the sandbox for > 'http://HOST_REDACTED/PATH_REDACTED.tar.gz', due to failure to fetch through > the cache, with error: Could not determine size of cache file for > 'USER_REDACTED@http://HOST_REDACTED/PATH_REDACTED.tar.gz' with error: No URL > content-length available > {code} > Any idea why this error would occur? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7572) Attach latest symlink when executor is registered.
[ https://issues.apache.org/jira/browse/MESOS-7572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Wood updated MESOS-7572: -- Description: This will assist framework developers in making features that need to access the latest sandbox when hitting various operator API endpoints. (was: This will assist framework developers in making features that need to access the latest sandbox when hitting various operator API endpoints. https://reviews.apache.org/r/59641/) > Attach latest symlink when executor is registered. > -- > > Key: MESOS-7572 > URL: https://issues.apache.org/jira/browse/MESOS-7572 > Project: Mesos > Issue Type: Improvement > Components: agent, HTTP API, master >Reporter: Aaron Wood >Assignee: Aaron Wood > > This will assist framework developers in making features that need to access > the latest sandbox when hitting various operator API endpoints. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-7572) Attach latest symlink when executor is registered.
[ https://issues.apache.org/jira/browse/MESOS-7572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16037664#comment-16037664 ] Aaron Wood commented on MESOS-7572: --- https://reviews.apache.org/r/59641/ > Attach latest symlink when executor is registered. > -- > > Key: MESOS-7572 > URL: https://issues.apache.org/jira/browse/MESOS-7572 > Project: Mesos > Issue Type: Improvement > Components: agent, HTTP API, master >Reporter: Aaron Wood >Assignee: Aaron Wood > > This will assist framework developers in making features that need to access > the latest sandbox when hitting various operator API endpoints. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7572) Attach latest symlink when executor is registered.
[ https://issues.apache.org/jira/browse/MESOS-7572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Wood updated MESOS-7572: -- Description: This will assist framework developers in making features that need to access the latest sandbox when hitting various operator API endpoints. https://reviews.apache.org/r/59641/ was: The main benefit of following symlinks in endpoints such as {code}/files{code} is that frameworks will be able to construct a path to the sandbox much easier. This will assist framework developers in making features that need to provide a path when hitting various operator API endpoints. Currently, making use of a path ending in {code}runs/latest{code} throws a 404. One such application could be a scheduler providing the ability for users to work with their task's sandbox directly without going to the Mesos UI, API endpoints, or the actual system themselves. https://reviews.apache.org/r/59641/ > Attach latest symlink when executor is registered. > -- > > Key: MESOS-7572 > URL: https://issues.apache.org/jira/browse/MESOS-7572 > Project: Mesos > Issue Type: Improvement > Components: agent, HTTP API, master >Reporter: Aaron Wood >Assignee: Aaron Wood > > This will assist framework developers in making features that need to access > the latest sandbox when hitting various operator API endpoints. > https://reviews.apache.org/r/59641/ -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7572) Attach latest symlink when executor is registered.
[ https://issues.apache.org/jira/browse/MESOS-7572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Wood updated MESOS-7572: -- Summary: Attach latest symlink when executor is registered. (was: Follow symlinks when resolving paths in the various master/agent endpoints) > Attach latest symlink when executor is registered. > -- > > Key: MESOS-7572 > URL: https://issues.apache.org/jira/browse/MESOS-7572 > Project: Mesos > Issue Type: Improvement > Components: agent, HTTP API, master >Reporter: Aaron Wood >Assignee: Aaron Wood > > The main benefit of following symlinks in endpoints such as > {code}/files{code} is that frameworks will be able to construct a path to the > sandbox much easier. This will assist framework developers in making features > that need to provide a path when hitting various operator API endpoints. > Currently, making use of a path ending in {code}runs/latest{code} throws a > 404. > One such application could be a scheduler providing the ability for users to > work with their task's sandbox directly without going to the Mesos UI, API > endpoints, or the actual system themselves. > https://reviews.apache.org/r/59641/ -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-7566) Master crash due to failed check in DRFSorter::remove
[ https://issues.apache.org/jira/browse/MESOS-7566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16037599#comment-16037599 ] Benjamin Mahler commented on MESOS-7566: [~xujyan] can you file a ticket for the race you described? It isn't the issue in this ticket AFAICT, but we should capture it and fix it as well. > Master crash due to failed check in DRFSorter::remove > - > > Key: MESOS-7566 > URL: https://issues.apache.org/jira/browse/MESOS-7566 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.1.1, 1.1.2 >Reporter: Zhitao Li >Priority: Critical > > A check in [sorter.cpp#L355 in 1.1.2 | > https://github.com/apache/mesos/blob/1.1.2/src/master/allocator/sorter/drf/sorter.cpp#L355] > is triggered occasionally in our cluster and crashes the master leader. > I manually modified that check to print out the related variables, and the > following is a master log. > https://gist.github.com/zhitaoli/0662d9fe1f6d57de344951c05b536bad#file-gistfile1-txt > From the log, it seems like the check was using an stale value revocable CPU > {{26}} while the new value was updated to 25, thus the check crashed. > So far two verified occurrence of this bug are both observed near an > {{UNRESERVE}} operation (see lines above in the log). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-7566) Master crash due to failed check in DRFSorter::remove
[ https://issues.apache.org/jira/browse/MESOS-7566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16037576#comment-16037576 ] Benjamin Mahler commented on MESOS-7566: For posterity, line 773 in [~zhitao]'s version corresponds to: https://github.com/apache/mesos/blob/1.1.2/src/master/allocator/mesos/hierarchical.cpp#L749 > Master crash due to failed check in DRFSorter::remove > - > > Key: MESOS-7566 > URL: https://issues.apache.org/jira/browse/MESOS-7566 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.1.1, 1.1.2 >Reporter: Zhitao Li >Priority: Critical > > A check in [sorter.cpp#L355 in 1.1.2 | > https://github.com/apache/mesos/blob/1.1.2/src/master/allocator/sorter/drf/sorter.cpp#L355] > is triggered occasionally in our cluster and crashes the master leader. > I manually modified that check to print out the related variables, and the > following is a master log. > https://gist.github.com/zhitaoli/0662d9fe1f6d57de344951c05b536bad#file-gistfile1-txt > From the log, it seems like the check was using an stale value revocable CPU > {{26}} while the new value was updated to 25, thus the check crashed. > So far two verified occurrence of this bug are both observed near an > {{UNRESERVE}} operation (see lines above in the log). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-7566) Master crash due to failed check in DRFSorter::remove
[ https://issues.apache.org/jira/browse/MESOS-7566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16037565#comment-16037565 ] Zhitao Li commented on MESOS-7566: -- A similar but maybe not identical crash's stack trace reported by gdb: https://gist.github.com/zhitaoli/180f7aa3c619dab44db19af92fd7d3a1 This is a slightly patched version of `1.1.2`. > Master crash due to failed check in DRFSorter::remove > - > > Key: MESOS-7566 > URL: https://issues.apache.org/jira/browse/MESOS-7566 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.1.1, 1.1.2 >Reporter: Zhitao Li >Priority: Critical > > A check in [sorter.cpp#L355 in 1.1.2 | > https://github.com/apache/mesos/blob/1.1.2/src/master/allocator/sorter/drf/sorter.cpp#L355] > is triggered occasionally in our cluster and crashes the master leader. > I manually modified that check to print out the related variables, and the > following is a master log. > https://gist.github.com/zhitaoli/0662d9fe1f6d57de344951c05b536bad#file-gistfile1-txt > From the log, it seems like the check was using an stale value revocable CPU > {{26}} while the new value was updated to 25, thus the check crashed. > So far two verified occurrence of this bug are both observed near an > {{UNRESERVE}} operation (see lines above in the log). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (MESOS-7566) Master crash due to failed check in DRFSorter::remove
[ https://issues.apache.org/jira/browse/MESOS-7566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16037565#comment-16037565 ] Zhitao Li edited comment on MESOS-7566 at 6/5/17 8:50 PM: -- A similar but maybe not identical crash's stack trace reported by gdb: https://gist.github.com/zhitaoli/180f7aa3c619dab44db19af92fd7d3a1 This is a slightly patched version of {{1.1.2}}. was (Author: zhitao): A similar but maybe not identical crash's stack trace reported by gdb: https://gist.github.com/zhitaoli/180f7aa3c619dab44db19af92fd7d3a1 This is a slightly patched version of `1.1.2`. > Master crash due to failed check in DRFSorter::remove > - > > Key: MESOS-7566 > URL: https://issues.apache.org/jira/browse/MESOS-7566 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.1.1, 1.1.2 >Reporter: Zhitao Li >Priority: Critical > > A check in [sorter.cpp#L355 in 1.1.2 | > https://github.com/apache/mesos/blob/1.1.2/src/master/allocator/sorter/drf/sorter.cpp#L355] > is triggered occasionally in our cluster and crashes the master leader. > I manually modified that check to print out the related variables, and the > following is a master log. > https://gist.github.com/zhitaoli/0662d9fe1f6d57de344951c05b536bad#file-gistfile1-txt > From the log, it seems like the check was using an stale value revocable CPU > {{26}} while the new value was updated to 25, thus the check crashed. > So far two verified occurrence of this bug are both observed near an > {{UNRESERVE}} operation (see lines above in the log). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (MESOS-7423) Add s390x builds to Mesos CI
[ https://issues.apache.org/jira/browse/MESOS-7423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16037444#comment-16037444 ] Vinod Kone edited comment on MESOS-7423 at 6/5/17 7:34 PM: --- [~Nayana] Can you get `pip` and `docker` installed on these VMs? It's needed for our CI jobs. was (Author: vinodkone): [~Nayana] Can you get `pip` installed on these VMs? It's needed for our CI jobs (in addition to `docker` which I'm assuming is already installed) > Add s390x builds to Mesos CI > > > Key: MESOS-7423 > URL: https://issues.apache.org/jira/browse/MESOS-7423 > Project: Mesos > Issue Type: Task >Reporter: Nayana Thorat > > Hi Vinod, > We had raised an issue to add s390x support for mesos which was fixed and > resolved. > https://issues.apache.org/jira/browse/MESOS-6742 > We also want to know about Mesos CI. > We need following details about current Mesos CI: > 1. How is the current Mesos CI infrastructure? Travis/Jenkins? > 2. Can Mesos CI extended to support s390x systems? > We are not sure if this is right channel to discuss this topic. > Please let us know if you want to start this discussion on some other channel. > Thanks, -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-7423) Add s390x builds to Mesos CI
[ https://issues.apache.org/jira/browse/MESOS-7423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16037444#comment-16037444 ] Vinod Kone commented on MESOS-7423: --- [~Nayana] Can you get `pip` installed on these VMs? It's needed for our CI jobs (in addition to `docker` which I'm assuming is already installed) > Add s390x builds to Mesos CI > > > Key: MESOS-7423 > URL: https://issues.apache.org/jira/browse/MESOS-7423 > Project: Mesos > Issue Type: Task >Reporter: Nayana Thorat > > Hi Vinod, > We had raised an issue to add s390x support for mesos which was fixed and > resolved. > https://issues.apache.org/jira/browse/MESOS-6742 > We also want to know about Mesos CI. > We need following details about current Mesos CI: > 1. How is the current Mesos CI infrastructure? Travis/Jenkins? > 2. Can Mesos CI extended to support s390x systems? > We are not sure if this is right channel to discuss this topic. > Please let us know if you want to start this discussion on some other channel. > Thanks, -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7423) Add s390x builds to Mesos CI
[ https://issues.apache.org/jira/browse/MESOS-7423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-7423: -- Summary: Add s390x builds to Mesos CI (was: Information on Mesos CI) I'm updating the title of this ticket to capture the work needed to enable s390x builds in CI. > Add s390x builds to Mesos CI > > > Key: MESOS-7423 > URL: https://issues.apache.org/jira/browse/MESOS-7423 > Project: Mesos > Issue Type: Task >Reporter: Nayana Thorat > > Hi Vinod, > We had raised an issue to add s390x support for mesos which was fixed and > resolved. > https://issues.apache.org/jira/browse/MESOS-6742 > We also want to know about Mesos CI. > We need following details about current Mesos CI: > 1. How is the current Mesos CI infrastructure? Travis/Jenkins? > 2. Can Mesos CI extended to support s390x systems? > We are not sure if this is right channel to discuss this topic. > Please let us know if you want to start this discussion on some other channel. > Thanks, -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7622) Agent can crash if a HTTP executor tries to retry subscription in running state.
[ https://issues.apache.org/jira/browse/MESOS-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anand Mazumdar updated MESOS-7622: -- Target Version/s: 1.2.2, 1.3.1 > Agent can crash if a HTTP executor tries to retry subscription in running > state. > > > Key: MESOS-7622 > URL: https://issues.apache.org/jira/browse/MESOS-7622 > Project: Mesos > Issue Type: Bug > Components: agent, executor >Reporter: Aaron Wood >Assignee: Anand Mazumdar >Priority: Blocker > > It is possible that a running executor might retry its subscribe request. > This can lead to a crash if it previously had any launched tasks. Note that > the executor would still be able to subscribe again when the agent process > restarts and is recovering. > {code} > sudo ./mesos-agent --master=10.0.2.15:5050 --work_dir=/tmp/slave > --isolation=cgroups/cpu,cgroups/mem,disk/du,network/cni,filesystem/linux,docker/runtime > --image_providers=docker --image_provisioner_backend=overlay > --containerizers=mesos --launcher_dir=$(pwd) > --executor_environment_variables='{"LD_LIBRARY_PATH": > "/home/aaron/Code/src/mesos/build/src/.libs"}' > WARNING: Logging before InitGoogleLogging() is written to STDERR > I0605 14:58:23.748180 10710 main.cpp:323] Build: 2017-06-02 17:09:05 UTC by > aaron > I0605 14:58:23.748252 10710 main.cpp:324] Version: 1.4.0 > I0605 14:58:23.755409 10710 systemd.cpp:238] systemd version `232` detected > I0605 14:58:23.755450 10710 main.cpp:433] Initializing systemd state > I0605 14:58:23.763049 10710 systemd.cpp:326] Started systemd slice > `mesos_executors.slice` > I0605 14:58:23.763777 10710 resolver.cpp:69] Creating default secret resolver > I0605 14:58:23.764214 10710 containerizer.cpp:230] Using isolation: > cgroups/cpu,cgroups/mem,disk/du,network/cni,filesystem/linux,docker/runtime,volume/image,environment_secret > I0605 14:58:23.767192 10710 linux_launcher.cpp:150] Using > /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher > E0605 14:58:23.770179 10710 shell.hpp:107] Command 'hadoop version 2>&1' > failed; this is the output: > sh: 1: hadoop: not found > I0605 14:58:23.770217 10710 fetcher.cpp:69] Skipping URI fetcher plugin > 'hadoop' as it could not be created: Failed to create HDFS client: Failed to > execute 'hadoop version 2>&1'; the command was either not found or exited > with a non-zero exit status: 127 > I0605 14:58:23.770643 10710 provisioner.cpp:255] Using default backend > 'overlay' > I0605 14:58:23.785892 10710 slave.cpp:248] Mesos agent started on > (1)@127.0.1.1:5051 > I0605 14:58:23.785957 10710 slave.cpp:249] Flags at startup: > --appc_simple_discovery_uri_prefix="http://"; > --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" > --authenticate_http_readwrite="false" --authenticatee="crammd5" > --authentication_backoff_factor="1secs" --authorizer="local" > --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" > --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" > --cgroups_root="mesos" --container_disk_watch_interval="15secs" > --containerizers="mesos" --default_role="*" --disk_watch_interval="1mins" > --docker="docker" --docker_kill_orphans="true" > --docker_registry="https://registry-1.docker.io"; --docker_remove_delay="6hrs" > --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" > --docker_store_dir="/tmp/mesos/store/docker" > --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" > --enforce_container_disk_quota="false" > --executor_environment_variables="{"LD_LIBRARY_PATH":"\/home\/aaron\/Code\/src\/mesos\/build\/src\/.libs"}" > --executor_registration_timeout="1mins" > --executor_reregistration_timeout="2secs" > --executor_shutdown_grace_period="5secs" > --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB" > --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" > --hadoop_home="" --help="false" --hostname_lookup="true" > --http_command_executor="false" --http_heartbeat_interval="30secs" > --image_providers="docker" --image_provisioner_backend="overlay" > --initialize_driver_logging="true" > --isolation="cgroups/cpu,cgroups/mem,disk/du,network/cni,filesystem/linux,docker/runtime" > --launcher="linux" --launcher_dir="/home/aaron/Code/src/mesos/build/src" > --logbufsecs="0" --logging_level="INFO" --master="10.0.2.15:5050" > --max_completed_executors_per_framework="150" > --oversubscribed_resources_interval="15secs" --perf_duration="10secs" > --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" > --quiet="false" --recover="reconnect" --recovery_timeout="15mins" > --registration_backoff_factor="1secs" --revocable_cpu_low_priority="true" > --runtime_dir="/var/r
[jira] [Updated] (MESOS-7622) Agent can crash if a HTTP executor tries to retry subscription in running state.
[ https://issues.apache.org/jira/browse/MESOS-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anand Mazumdar updated MESOS-7622: -- Description: It is possible that a running executor might retry its subscribe request. This can lead to a crash if it previously had any launched tasks. Note that the executor would still be able to subscribe again when the agent process restarts and is recovering. {code} sudo ./mesos-agent --master=10.0.2.15:5050 --work_dir=/tmp/slave --isolation=cgroups/cpu,cgroups/mem,disk/du,network/cni,filesystem/linux,docker/runtime --image_providers=docker --image_provisioner_backend=overlay --containerizers=mesos --launcher_dir=$(pwd) --executor_environment_variables='{"LD_LIBRARY_PATH": "/home/aaron/Code/src/mesos/build/src/.libs"}' WARNING: Logging before InitGoogleLogging() is written to STDERR I0605 14:58:23.748180 10710 main.cpp:323] Build: 2017-06-02 17:09:05 UTC by aaron I0605 14:58:23.748252 10710 main.cpp:324] Version: 1.4.0 I0605 14:58:23.755409 10710 systemd.cpp:238] systemd version `232` detected I0605 14:58:23.755450 10710 main.cpp:433] Initializing systemd state I0605 14:58:23.763049 10710 systemd.cpp:326] Started systemd slice `mesos_executors.slice` I0605 14:58:23.763777 10710 resolver.cpp:69] Creating default secret resolver I0605 14:58:23.764214 10710 containerizer.cpp:230] Using isolation: cgroups/cpu,cgroups/mem,disk/du,network/cni,filesystem/linux,docker/runtime,volume/image,environment_secret I0605 14:58:23.767192 10710 linux_launcher.cpp:150] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher E0605 14:58:23.770179 10710 shell.hpp:107] Command 'hadoop version 2>&1' failed; this is the output: sh: 1: hadoop: not found I0605 14:58:23.770217 10710 fetcher.cpp:69] Skipping URI fetcher plugin 'hadoop' as it could not be created: Failed to create HDFS client: Failed to execute 'hadoop version 2>&1'; the command was either not found or exited with a non-zero exit status: 127 I0605 14:58:23.770643 10710 provisioner.cpp:255] Using default backend 'overlay' I0605 14:58:23.785892 10710 slave.cpp:248] Mesos agent started on (1)@127.0.1.1:5051 I0605 14:58:23.785957 10710 slave.cpp:249] Flags at startup: --appc_simple_discovery_uri_prefix="http://"; --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="mesos" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io"; --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" --enforce_container_disk_quota="false" --executor_environment_variables="{"LD_LIBRARY_PATH":"\/home\/aaron\/Code\/src\/mesos\/build\/src\/.libs"}" --executor_registration_timeout="1mins" --executor_reregistration_timeout="2secs" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --hostname_lookup="true" --http_command_executor="false" --http_heartbeat_interval="30secs" --image_providers="docker" --image_provisioner_backend="overlay" --initialize_driver_logging="true" --isolation="cgroups/cpu,cgroups/mem,disk/du,network/cni,filesystem/linux,docker/runtime" --launcher="linux" --launcher_dir="/home/aaron/Code/src/mesos/build/src" --logbufsecs="0" --logging_level="INFO" --master="10.0.2.15:5050" --max_completed_executors_per_framework="150" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="1secs" --revocable_cpu_low_priority="true" --runtime_dir="/var/run/mesos" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --systemd_enable_support="true" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/tmp/slave" I0605 14:58:23.786392 10710 slave.cpp:552] Agent resources: cpus(*):6; mem(*):6956; disk(*):41113; ports(*):[31000-32000] I0605 14:58:23.786437 10710 slave.cpp:560] Agent attributes: [ ] I0605 14:58:23.786468 10710 slave.cpp:565] Agent hostname: U64 I0605 14:58:23.786574 10714 status_update_manager.cpp:177] Pausing sending status updates I0605 14:58:23.787470 10718 state.cpp:62] Recovering state from '/tmp/slave/
[jira] [Updated] (MESOS-7622) Agent can crash if a HTTP executor tries to retry subscription in running state.
[ https://issues.apache.org/jira/browse/MESOS-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anand Mazumdar updated MESOS-7622: -- Description: It is possible that a running executor might retry its subscribe request. This can lead to a crash if it previously had any launched tasks. {code} sudo ./mesos-agent --master=10.0.2.15:5050 --work_dir=/tmp/slave --isolation=cgroups/cpu,cgroups/mem,disk/du,network/cni,filesystem/linux,docker/runtime --image_providers=docker --image_provisioner_backend=overlay --containerizers=mesos --launcher_dir=$(pwd) --executor_environment_variables='{"LD_LIBRARY_PATH": "/home/aaron/Code/src/mesos/build/src/.libs"}' WARNING: Logging before InitGoogleLogging() is written to STDERR I0605 14:58:23.748180 10710 main.cpp:323] Build: 2017-06-02 17:09:05 UTC by aaron I0605 14:58:23.748252 10710 main.cpp:324] Version: 1.4.0 I0605 14:58:23.755409 10710 systemd.cpp:238] systemd version `232` detected I0605 14:58:23.755450 10710 main.cpp:433] Initializing systemd state I0605 14:58:23.763049 10710 systemd.cpp:326] Started systemd slice `mesos_executors.slice` I0605 14:58:23.763777 10710 resolver.cpp:69] Creating default secret resolver I0605 14:58:23.764214 10710 containerizer.cpp:230] Using isolation: cgroups/cpu,cgroups/mem,disk/du,network/cni,filesystem/linux,docker/runtime,volume/image,environment_secret I0605 14:58:23.767192 10710 linux_launcher.cpp:150] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher E0605 14:58:23.770179 10710 shell.hpp:107] Command 'hadoop version 2>&1' failed; this is the output: sh: 1: hadoop: not found I0605 14:58:23.770217 10710 fetcher.cpp:69] Skipping URI fetcher plugin 'hadoop' as it could not be created: Failed to create HDFS client: Failed to execute 'hadoop version 2>&1'; the command was either not found or exited with a non-zero exit status: 127 I0605 14:58:23.770643 10710 provisioner.cpp:255] Using default backend 'overlay' I0605 14:58:23.785892 10710 slave.cpp:248] Mesos agent started on (1)@127.0.1.1:5051 I0605 14:58:23.785957 10710 slave.cpp:249] Flags at startup: --appc_simple_discovery_uri_prefix="http://"; --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="mesos" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io"; --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" --enforce_container_disk_quota="false" --executor_environment_variables="{"LD_LIBRARY_PATH":"\/home\/aaron\/Code\/src\/mesos\/build\/src\/.libs"}" --executor_registration_timeout="1mins" --executor_reregistration_timeout="2secs" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --hostname_lookup="true" --http_command_executor="false" --http_heartbeat_interval="30secs" --image_providers="docker" --image_provisioner_backend="overlay" --initialize_driver_logging="true" --isolation="cgroups/cpu,cgroups/mem,disk/du,network/cni,filesystem/linux,docker/runtime" --launcher="linux" --launcher_dir="/home/aaron/Code/src/mesos/build/src" --logbufsecs="0" --logging_level="INFO" --master="10.0.2.15:5050" --max_completed_executors_per_framework="150" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="1secs" --revocable_cpu_low_priority="true" --runtime_dir="/var/run/mesos" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --systemd_enable_support="true" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/tmp/slave" I0605 14:58:23.786392 10710 slave.cpp:552] Agent resources: cpus(*):6; mem(*):6956; disk(*):41113; ports(*):[31000-32000] I0605 14:58:23.786437 10710 slave.cpp:560] Agent attributes: [ ] I0605 14:58:23.786468 10710 slave.cpp:565] Agent hostname: U64 I0605 14:58:23.786574 10714 status_update_manager.cpp:177] Pausing sending status updates I0605 14:58:23.787470 10718 state.cpp:62] Recovering state from '/tmp/slave/meta' I0605 14:58:23.787698 10713 status_update_manager.cpp:203] Recovering status update manager I0605 14:58:23.7
[jira] [Created] (MESOS-7622) Agent crashes if the default executor launches a custom executor which then tries to subscribe
Aaron Wood created MESOS-7622: - Summary: Agent crashes if the default executor launches a custom executor which then tries to subscribe Key: MESOS-7622 URL: https://issues.apache.org/jira/browse/MESOS-7622 Project: Mesos Issue Type: Bug Components: agent, executor Reporter: Aaron Wood Assignee: Anand Mazumdar Priority: Blocker {code} sudo ./mesos-agent --master=10.0.2.15:5050 --work_dir=/tmp/slave --isolation=cgroups/cpu,cgroups/mem,disk/du,network/cni,filesystem/linux,docker/runtime --image_providers=docker --image_provisioner_backend=overlay --containerizers=mesos --launcher_dir=$(pwd) --executor_environment_variables='{"LD_LIBRARY_PATH": "/home/aaron/Code/src/mesos/build/src/.libs"}' WARNING: Logging before InitGoogleLogging() is written to STDERR I0605 14:58:23.748180 10710 main.cpp:323] Build: 2017-06-02 17:09:05 UTC by aaron I0605 14:58:23.748252 10710 main.cpp:324] Version: 1.4.0 I0605 14:58:23.755409 10710 systemd.cpp:238] systemd version `232` detected I0605 14:58:23.755450 10710 main.cpp:433] Initializing systemd state I0605 14:58:23.763049 10710 systemd.cpp:326] Started systemd slice `mesos_executors.slice` I0605 14:58:23.763777 10710 resolver.cpp:69] Creating default secret resolver I0605 14:58:23.764214 10710 containerizer.cpp:230] Using isolation: cgroups/cpu,cgroups/mem,disk/du,network/cni,filesystem/linux,docker/runtime,volume/image,environment_secret I0605 14:58:23.767192 10710 linux_launcher.cpp:150] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher E0605 14:58:23.770179 10710 shell.hpp:107] Command 'hadoop version 2>&1' failed; this is the output: sh: 1: hadoop: not found I0605 14:58:23.770217 10710 fetcher.cpp:69] Skipping URI fetcher plugin 'hadoop' as it could not be created: Failed to create HDFS client: Failed to execute 'hadoop version 2>&1'; the command was either not found or exited with a non-zero exit status: 127 I0605 14:58:23.770643 10710 provisioner.cpp:255] Using default backend 'overlay' I0605 14:58:23.785892 10710 slave.cpp:248] Mesos agent started on (1)@127.0.1.1:5051 I0605 14:58:23.785957 10710 slave.cpp:249] Flags at startup: --appc_simple_discovery_uri_prefix="http://"; --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="mesos" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io"; --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" --enforce_container_disk_quota="false" --executor_environment_variables="{"LD_LIBRARY_PATH":"\/home\/aaron\/Code\/src\/mesos\/build\/src\/.libs"}" --executor_registration_timeout="1mins" --executor_reregistration_timeout="2secs" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --hostname_lookup="true" --http_command_executor="false" --http_heartbeat_interval="30secs" --image_providers="docker" --image_provisioner_backend="overlay" --initialize_driver_logging="true" --isolation="cgroups/cpu,cgroups/mem,disk/du,network/cni,filesystem/linux,docker/runtime" --launcher="linux" --launcher_dir="/home/aaron/Code/src/mesos/build/src" --logbufsecs="0" --logging_level="INFO" --master="10.0.2.15:5050" --max_completed_executors_per_framework="150" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="1secs" --revocable_cpu_low_priority="true" --runtime_dir="/var/run/mesos" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --systemd_enable_support="true" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/tmp/slave" I0605 14:58:23.786392 10710 slave.cpp:552] Agent resources: cpus(*):6; mem(*):6956; disk(*):41113; ports(*):[31000-32000] I0605 14:58:23.786437 10710 slave.cpp:560] Agent attributes: [ ] I0605 14:58:23.786468 10710 slave.cpp:565] Agent hostname: U64 I0605 14:58:23.786574 10714 status_update_manager.cpp:177] Pausing sending status updates I0605 14:58:23.787470 10718 state.cpp:62] Recovering state from '/
[jira] [Created] (MESOS-7621) Fetcher does not handle content length in redirects
Charles Allen created MESOS-7621: Summary: Fetcher does not handle content length in redirects Key: MESOS-7621 URL: https://issues.apache.org/jira/browse/MESOS-7621 Project: Mesos Issue Type: Bug Components: fetcher Affects Versions: 1.2.0 Reporter: Charles Allen {code} $ curl -L -v -O -s http://HOSTNAME_REDACTED/PATH_REDACTED.tar.gz * Trying 172.17.4.10... * Connected to HOSTNAME_REDACTED (172.17.4.10) port 80 (#0) > GET /PATH_REDACTED.tar.gz HTTP/1.1 > Host: HOSTNAME_REDACTED > User-Agent: curl/7.43.0 > Accept: */* > < HTTP/1.1 302 FOUND < Server: nginx/1.4.6 (Ubuntu) < Date: Mon, 05 Jun 2017 17:58:04 GMT < Content-Type: text/html; charset=utf-8 < Content-Length: 1947 < Connection: keep-alive < Location: https://BUCKET_REDACTED.s3.amazonaws.com:443/PATH_REDACTED?Signature=REDACTED%3D&Expires=1496689084&AWSAccessKeyId=KEY_REDACTED&x-amz-security-token=TOKEN_REDACTED%3D < * Ignoring the response-body { [309 bytes data] * Connection #0 to host HOSTNAME_REDACTED left intact * Issue another request to this URL: 'https://BUCKET_REDACTED.s3.amazonaws.com:443/PATH_REDACTED.tar.gz?Signature=SIGNATURE_REDACTED%3D&Expires=1496689084&AWSAccessKeyId=KEY_REDACTED&x-amz-security-token=TOKEN_REDACTED%3D' * Trying 54.231.40.75... * Connected to BUCKET_REDACTED.s3.amazonaws.com (54.231.40.75) port 443 (#1) * TLS 1.2 connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 * Server certificate: *.s3.amazonaws.com * Server certificate: DigiCert Baltimore CA-2 G2 * Server certificate: Baltimore CyberTrust Root > GET > /PATH_REDACTED.tar.gz?Signature=REDACTED&Expires=1496689084&AWSAccessKeyId=KEY_REDACTED&x-amz-security-token=TOKEN_REDACTED%3D > HTTP/1.1 > Host: BUCKET_REDACTED.s3.amazonaws.com > User-Agent: curl/7.43.0 > Accept: */* > < HTTP/1.1 200 OK < x-amz-id-2: ID_REDACTED= < x-amz-request-id: REQUEST_ID_REDACTED < Date: Mon, 05 Jun 2017 17:58:07 GMT < Last-Modified: Thu, 01 Jun 2017 03:04:49 GMT < ETag: "ETAG_REDACTED" < Accept-Ranges: bytes < Content-Type: application/x-tar < Content-Length: 208245664 < Server: AmazonS3 < { [16360 bytes data] {code} We have a micro-service which signs temporary urls for services which can't speak natively with S3. The above is an example download using {{curl}}. But when using the mesos fetcher the agent logs contain the following information: {code} fetcher.cpp:479] Reverting to fetching directly into the sandbox for 'http://HOST_REDACTED/PATH_REDACTED.tar.gz', due to failure to fetch through the cache, with error: Could not determine size of cache file for 'USER_REDACTED@http://HOST_REDACTED/PATH_REDACTED.tar.gz' with error: No URL content-length available {code} Any idea why this error would occur? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7619) Framework Upgrade Resulting in Jan 1, 1070 Date
[ https://issues.apache.org/jira/browse/MESOS-7619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Cook updated MESOS-7619: - Attachment: state.json > Framework Upgrade Resulting in Jan 1, 1070 Date > --- > > Key: MESOS-7619 > URL: https://issues.apache.org/jira/browse/MESOS-7619 > Project: Mesos > Issue Type: Bug >Reporter: Ken Sipe > Attachments: Pasted image at 2017_05_31 09_30 AM.png, state.json > > > In the process of upgrading Apache Mesos and Marathon (in HA mode).. marathon > ended up with a new framework ID and the older framework ID is listed as > being from Jan 1, 1970 (47 years ago). > The issue with Marathon getting a new framework Id is understood and was > worked out with mesosphere's marathon team. Must of the detail is in the > #marathon channel of Apache Mesos slack. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7619) Framework Upgrade Resulting in Jan 1, 1070 Date
[ https://issues.apache.org/jira/browse/MESOS-7619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Cook updated MESOS-7619: - Attachment: (was: state.json) > Framework Upgrade Resulting in Jan 1, 1070 Date > --- > > Key: MESOS-7619 > URL: https://issues.apache.org/jira/browse/MESOS-7619 > Project: Mesos > Issue Type: Bug >Reporter: Ken Sipe > Attachments: Pasted image at 2017_05_31 09_30 AM.png, state.json > > > In the process of upgrading Apache Mesos and Marathon (in HA mode).. marathon > ended up with a new framework ID and the older framework ID is listed as > being from Jan 1, 1970 (47 years ago). > The issue with Marathon getting a new framework Id is understood and was > worked out with mesosphere's marathon team. Must of the detail is in the > #marathon channel of Apache Mesos slack. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-7619) Framework Upgrade Resulting in Jan 1, 1070 Date
[ https://issues.apache.org/jira/browse/MESOS-7619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16037280#comment-16037280 ] Andy Cook commented on MESOS-7619: -- Hello, Please find the {{state.json}} info attached. I've removed all of the info about running and completed tasks. The old framework (with all of the running tasks) is {{80f08ece-91c7-43cb-bae7-6a2b41e25ec0-0001}}. You'll see in the screenshot that Mesos believes it was registered 47 years ago. Similarly, the state.json shows a registered_time of {{0}}. {noformat} "failover_timeout":604800.0, "checkpoint":true, "registered_time":0.0, "unregistered_time":0.0, {noformat} > Framework Upgrade Resulting in Jan 1, 1070 Date > --- > > Key: MESOS-7619 > URL: https://issues.apache.org/jira/browse/MESOS-7619 > Project: Mesos > Issue Type: Bug >Reporter: Ken Sipe > Attachments: Pasted image at 2017_05_31 09_30 AM.png, state.json > > > In the process of upgrading Apache Mesos and Marathon (in HA mode).. marathon > ended up with a new framework ID and the older framework ID is listed as > being from Jan 1, 1970 (47 years ago). > The issue with Marathon getting a new framework Id is understood and was > worked out with mesosphere's marathon team. Must of the detail is in the > #marathon channel of Apache Mesos slack. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7619) Framework Upgrade Resulting in Jan 1, 1070 Date
[ https://issues.apache.org/jira/browse/MESOS-7619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Cook updated MESOS-7619: - Attachment: state.json > Framework Upgrade Resulting in Jan 1, 1070 Date > --- > > Key: MESOS-7619 > URL: https://issues.apache.org/jira/browse/MESOS-7619 > Project: Mesos > Issue Type: Bug >Reporter: Ken Sipe > Attachments: Pasted image at 2017_05_31 09_30 AM.png, state.json > > > In the process of upgrading Apache Mesos and Marathon (in HA mode).. marathon > ended up with a new framework ID and the older framework ID is listed as > being from Jan 1, 1970 (47 years ago). > The issue with Marathon getting a new framework Id is understood and was > worked out with mesosphere's marathon team. Must of the detail is in the > #marathon channel of Apache Mesos slack. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7620) GET_VOLUMES call referenced in API docs, but the call doesn't exist
James DeFelice created MESOS-7620: - Summary: GET_VOLUMES call referenced in API docs, but the call doesn't exist Key: MESOS-7620 URL: https://issues.apache.org/jira/browse/MESOS-7620 Project: Mesos Issue Type: Bug Reporter: James DeFelice https://github.com/apache/mesos/blob/d624255394b864ed477838e32f9712d7e63fc86f/include/mesos/v1/master/master.proto#L150 {code} // Create persistent volumes on reserved resources. The request is forwarded // asynchronously to the Mesos agent where the reserved resources are located. // That asynchronous message may not be delivered or creating the volumes at // the agent might fail. Volume creation can be verified by sending a // `GET_VOLUMES` call. {code} It's either a documentation bug, or a missing/overlooked feature. /cc [~vinodkone] [~jieyu] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7619) Framework Upgrade Resulting in Jan 1, 1070 Date
[ https://issues.apache.org/jira/browse/MESOS-7619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ken Sipe updated MESOS-7619: Attachment: Pasted image at 2017_05_31 09_30 AM.png screen shot > Framework Upgrade Resulting in Jan 1, 1070 Date > --- > > Key: MESOS-7619 > URL: https://issues.apache.org/jira/browse/MESOS-7619 > Project: Mesos > Issue Type: Bug >Reporter: Ken Sipe > Attachments: Pasted image at 2017_05_31 09_30 AM.png > > > In the process of upgrading Apache Mesos and Marathon (in HA mode).. marathon > ended up with a new framework ID and the older framework ID is listed as > being from Jan 1, 1970 (47 years ago). > The issue with Marathon getting a new framework Id is understood and was > worked out with mesosphere's marathon team. Must of the detail is in the > #marathon channel of Apache Mesos slack. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7619) Framework Upgrade Resulting in Jan 1, 1070 Date
Ken Sipe created MESOS-7619: --- Summary: Framework Upgrade Resulting in Jan 1, 1070 Date Key: MESOS-7619 URL: https://issues.apache.org/jira/browse/MESOS-7619 Project: Mesos Issue Type: Bug Reporter: Ken Sipe In the process of upgrading Apache Mesos and Marathon (in HA mode).. marathon ended up with a new framework ID and the older framework ID is listed as being from Jan 1, 1970 (47 years ago). The issue with Marathon getting a new framework Id is understood and was worked out with mesosphere's marathon team. Must of the detail is in the #marathon channel of Apache Mesos slack. -- This message was sent by Atlassian JIRA (v6.3.15#6346)