Re: RFC: packaging Spark without assemblies

2015-09-26 Thread Steve Loughran

> On 25 Sep 2015, at 19:11, Marcelo Vanzin  wrote:
> 
> - People who ship the assembly with their application. As Matei
> suggested (and I agree), that is kinda weird. But currently that is
> the easiest way to embed Spark and get, for example, the YARN backend
> working. There are ways around that but they are tricky. The code
> changes I propose would make that much easier to do without the need
> for an assembly.

not wierd if you are bypassing bin/spark


> 
> - People who somehow depend on the layout of the Spark distribution.
> Meaning they expect a "lib/" directory with an assembly in there
> matching a specific file name pattern. Although I kinda consider that
> to be an invalid use case (as in "you're doing it wrong").

well, spark-submit and spark-example shells do something close to this, though 
primarly as error checking against >1 artifact and classpath confusion

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: RFC: packaging Spark without assemblies

2015-09-25 Thread Marcelo Vanzin
On Wed, Sep 23, 2015 at 4:43 PM, Patrick Wendell  wrote:
> For me a key step in moving away would be to fully audit/understand
> all compatibility implications of removing it. If other people are
> supportive of this plan I can offer to help spend some time thinking
> about any potential corner cases, etc.

Thanks Patrick (and all the others) who commented on the document.

For BC, I think there are two main cases:

- People who ship the assembly with their application. As Matei
suggested (and I agree), that is kinda weird. But currently that is
the easiest way to embed Spark and get, for example, the YARN backend
working. There are ways around that but they are tricky. The code
changes I propose would make that much easier to do without the need
for an assembly.

- People who somehow depend on the layout of the Spark distribution.
Meaning they expect a "lib/" directory with an assembly in there
matching a specific file name pattern. Although I kinda consider that
to be an invalid use case (as in "you're doing it wrong").

One potential way to avoid it is to do the work to make the assemblies
unnecessary, but not get rid of them, at least at first. Maybe a build
profile or an argument in make-distribution.sh to enable or disable
them as desired.

-- 
Marcelo

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



RFC: packaging Spark without assemblies

2015-09-23 Thread Marcelo Vanzin
Hey all,

This is something that we've discussed several times internally, but
never really had much time to look into; but as time passes by, it's
increasingly becoming an issue for us and I'd like to throw some ideas
around about how to fix it.

So, without further ado:
https://github.com/vanzin/spark/pull/2/files

(You can comment there or click "View" to read the formatted document.
I thought that would be easier than sharing on Google Drive or Box or
something.)

It would be great to get people's feedback, especially if there are
strong reasons for the assemblies that I'm not aware of.


-- 
Marcelo

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: RFC: packaging Spark without assemblies

2015-09-23 Thread Patrick Wendell
I think it would be a big improvement to get rid of it. It's not how
jars are supposed to be packaged and it has caused problems in many
different context over the years.

For me a key step in moving away would be to fully audit/understand
all compatibility implications of removing it. If other people are
supportive of this plan I can offer to help spend some time thinking
about any potential corner cases, etc.

- Patrick

On Wed, Sep 23, 2015 at 3:13 PM, Marcelo Vanzin  wrote:
> Hey all,
>
> This is something that we've discussed several times internally, but
> never really had much time to look into; but as time passes by, it's
> increasingly becoming an issue for us and I'd like to throw some ideas
> around about how to fix it.
>
> So, without further ado:
> https://github.com/vanzin/spark/pull/2/files
>
> (You can comment there or click "View" to read the formatted document.
> I thought that would be easier than sharing on Google Drive or Box or
> something.)
>
> It would be great to get people's feedback, especially if there are
> strong reasons for the assemblies that I'm not aware of.
>
>
> --
> Marcelo
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org