subject:"\[jira\] \[Comment Edited\] \(SPARK\-5388\) Provide a stable application submission gateway in standalone cluster mode"

[jira] [Comment Edited] (SPARK-5388) Provide a stable application submission gateway in standalone cluster mode

2015-04-28 Thread Jeremy Hanna (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517546#comment-14517546
 ] 

Jeremy Hanna edited comment on SPARK-5388 at 4/28/15 6:08 PM:
--

Would people be amenable to having additional features for this such as a 
shared context and others described in SPARK-818 done by [~velvia]?


was (Author: jeromatron):
Would people be amenable to addition features for this such as a shared context 
and others described in SPARK-818 done by [~velvia]?

 Provide a stable application submission gateway in standalone cluster mode
 --

 Key: SPARK-5388
 URL: https://issues.apache.org/jira/browse/SPARK-5388
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.2.0
Reporter: Andrew Or
Assignee: Andrew Or
Priority: Blocker
 Fix For: 1.3.0

 Attachments: stable-spark-submit-in-standalone-mode-2-4-15.pdf


 The existing submission gateway in standalone mode is not compatible across 
 Spark versions. If you have a newer version of Spark submitting to an older 
 version of the standalone Master, it is currently not guaranteed to work. The 
 goal is to provide a stable REST interface to replace this channel.
 For more detail, please see the most recent design doc attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-5388) Provide a stable application submission gateway in standalone cluster mode

2015-02-05 Thread Andrew Or (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308599#comment-14308599
 ] 

Andrew Or edited comment on SPARK-5388 at 2/6/15 5:35 AM:
--

By the way for the more specific comments it would be good if you can leave 
them on the PR itself: https://github.com/apache/spark/pull/4216. The specs and 
the actual code will diverge after some review so the most up-to-date version 
will likely be there.


was (Author: andrewor14):
By the way for the more specific comments it would be good if you can leave 
them on the PR itself: https://github.com/apache/spark/pull/4216. The specs and 
the actual code will likely diverge after some review so the most up-to-date 
version will likely be there.

 Provide a stable application submission gateway in standalone cluster mode
 --

 Key: SPARK-5388
 URL: https://issues.apache.org/jira/browse/SPARK-5388
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.2.0
Reporter: Andrew Or
Assignee: Andrew Or
Priority: Blocker
 Attachments: stable-spark-submit-in-standalone-mode-2-4-15.pdf


 The existing submission gateway in standalone mode is not compatible across 
 Spark versions. If you have a newer version of Spark submitting to an older 
 version of the standalone Master, it is currently not guaranteed to work. The 
 goal is to provide a stable REST interface to replace this channel.
 For more detail, please see the most recent design doc attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-5388) Provide a stable application submission gateway in standalone cluster mode

2015-02-03 Thread Marcelo Vanzin (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14303702#comment-14303702
]

Marcelo Vanzin edited comment on SPARK-5388 at 2/3/15 7:48 PM:
---

HI [~pwendell],

Let me try to write a point-by-point feedback for the current spec.

h4. Public protocol or not?

If this is not supposed to be public (i.e., we don't expect someone to try to
directly try to talk to the Spark master, it's always going to happen through
the Spark libraries), then the underlying protocol is less important, since we
only care about different versions being compatible in some way.

Assuming a non-public protocol, my question would be: why implement your own
RPC framework? Why not reuse something that's already there? For example, Avro
has a stable serialization infrastructure that defines semantics for versioned
data, and works well on top of HTTP. If handles serialization and dispatching -
which would remove a lot of code from the current patch, and probably has other
features that the current, cluster-mode only protocol doesn't need but other
future uses might.

h4. Non-submission uses

Similarly, in the non-public protocol scenario, a proper REST-based API would
look like overkill. But a proper REST infrastructure provides interesting room
for growth of the master's public-facing API. For example, you could easily
expose an endpoint for listing the current applications being tracked by the
master, or an endpoint to kill an application. The former could benefit, also,
the history server, which could expose the same API to list the applications it
has found.

h4. Evolvability and Versioning

The current spec does not specify the behavior of the cluster nor the client
with regards to different versions of the protocol. It has a table that
basically says future versions need to be able to submit standalone cluster
applications to a 1.3 master, but it doesn't explain what that means or how
that happens.

Does it mean that, after 1.3, you can't ever change any of the messages used to
launch a standalone cluster app, nor can you add new messages? Or, if that's
allowed, what happens on the server side if it sees a field it doesn't
understand? Does it ignore it, which could potentially break the application
being submitted? Does it throw an error, in which case the client should make
sure to submit an older version of the data structures if that's compatible
with the app being submitted? If the latter, how does it know which version to
use?

As an example of how you could do this negotiation: the client checks what
features the app being submitted needs, and chooses the oldest supported api
version based on that. It then can submit the request to, e.g., /v2 and, if
submitting to a 1.3 cluster, it will fail, because it doesn't support the
features needed by that app.

Also, thinking about the framework, what if later you need different features
than the ones provided now? What if you need to use query params, path params,
or non-json request bodies (e.g. for uploading files)? Are you gonna extend the
current framework to the point where it starts looking like other existing ones?

Of, if HTTP is being used mostly as a dumb pipe, what are the semantics for
when something goes wrong? Should clients only bother about a response if the
status is 200 OK, or should they try to interpret the body of a 500 Internal
Server Error message or 401 Bad Request? Those things need to be specified.

h4. Others

If the suggestions above don't sound particularly interesting for this use
case, I'd strongly suggest, in the very least, removing any mention of REST
from the spec and the code, because this is not a REST protocol in any way.

Also, a question: if it's an HTTP protocol, why not expose it through the
existing http port?

To reply to the questions about my suggestions for how to use REST:

* when you add a new version, you don't remove old ones. Spark v1.4 could add
/v2, but it must still support /v1 in the way that it was specified.
* as for new fields / types, that really depends on how you specify things.
Personally, I like to declare a released API frozen: you can't add new types,
fields, or anything that the old release doesn't know about. Any new thing
requires a new protocol version. But you could take a different approach, by
adding optional fields that don't cause breakages when submitted to the old
server that doesn't know about them. Again, these choices need to be specified
up front, otherwise the implementation of v1 becomes the spec, since where the
spec is not clear, the choices made by the implementation will become a de
facto specification.

(BTW, especially with a v1, the implementation will invariably become a de
facto specification, that's unavoidable. But it helps to have the spec clearly
cover all areas, so that hopefully

[jira] [Comment Edited] (SPARK-5388) Provide a stable application submission gateway in standalone cluster mode

[jira] [Comment Edited] (SPARK-5388) Provide a stable application submission gateway in standalone cluster mode

[jira] [Comment Edited] (SPARK-5388) Provide a stable application submission gateway in standalone cluster mode

3 matches

Site Navigation

Mail list logo

Footer information