Re: Where to place "Spark + GemFire" connector.

2015-07-07 Thread John Blum
For clarification... what I specifically mean when I say "level of
modularity" can be reflected in the dependencies between modules. The POM
distinguishes required vs. non-required dependencies based on the "scope"
(i.e. 'compile'-time vs. 'optional', and so on).

If you look at the Maven POM files in Spring, you will begin to understand
how bits and pieces of the framework can be used independently from the
other pieces, how features are only enabled if certain classes are detected
on the classpath, etc.  E.g. I can use Spring DI independently of the
Spring Transaction Management infrastructure or vice versa; those 2 are
quite unrelated actually but the 2 compliment each other when combined
along with AOP, for instance.

By way of example, "persistence" is one concern that could be modularized
and pluggable (write to oplog, write to HDFS, or forgo both and write to
underlying RDMBS, whatever), and enabled based on adding the corresponding
JAR for the desired behavior.

Anyway, food for thought.

-j


On Tue, Jul 7, 2015 at 3:58 PM, John Blum  wrote:

> There are a few Spring projects that are exemplary (examples) in their
> modularity, contained within a single repo.  The core Spring Framework and
> Spring Boot are 2 such projects that immediately come to mind.
>
> However, this sort of disciplined modularity requires a very important
> delineation of responsibilities / separation of concerns reflected in the
> organization (and cleanliness) of the codebase combined with a very
> well-understood set of principles and practices to ensure this level of
> modularity is maintained; Geode is none of these things at the moment.  I
> echo Kirk's early concerns about build times and testing, etc, not to
> mention the gravity surrounding what is pertinent and what is not in order
> to contribute to the "core" of Geode.
>
> -j
>
> On Tue, Jul 7, 2015 at 2:55 PM, William Markito 
> wrote:
>
>> Folks,
>>
>> There is a lot of good and valuable points on this thread, however we need
>> to discuss some practical actions here and maybe even see what other
>> projects have already done during their incubation.
>>
>> For example, Apache Zeppelin (incubating) is also dependent on Spark and
>> what they do is select which version of Spark you're going to build
>> against.
>>
>> https://github.com/apache/incubator-zeppelin/tree/master
>>
>> Given that we don't yet have a single release, I'm not sure we should
>> already be that concerned about having and maintaining multiple sub
>> repositories or even sub projects.
>>
>> That said, it doesn't mean we shouldn't be concerned about modularization,
>> it's just that we don't actually need yet to have releases of each
>> independent modules trying to catch-up with other projects release cycle
>> without having a release cycle for our project yet.
>>
>> IHMO, whenever a Geode release happens we may need to decide if it's going
>> to support an specific Spark version or be built and support multiple
>> versions (like Zeppelin) -  That also may apply to HDFS since we also
>> support HDFS but we don't have HDFS integration as a separate repository
>> just in order to catch up with HDFS release cycle. It doesn't mean it
>> shouldn't be modularized and developed as a separate project under Geode.
>>
>> IOW, I'd vote to keep everything in the same repository, as different
>> projects, with modularized code and dependencies and when time comes,
>> after
>> a couple releases, if it makes sense, sure, break it in different
>> repositories or sub-projects that may grow by themselves.
>>
>> To be honest, I'm not exactly sure why the modularization discussion has
>> to
>> be so tied with repositories.  Spring or any other DI framework for
>> example
>> allows you to write nice and decoupled code... not mention other
>> techniques...
>>
>> My 0.0.2 cents (following semantic versioning :) )
>>
>> ~/William
>>
>> On Tue, Jul 7, 2015 at 1:33 PM, John Blum  wrote:
>>
>> > Just a quick word on maintaining different (release) branches for main
>> > dependencies (.e.g. "driver" dependencies).  Again, this is exactly what
>> > Spring Data GemFire does to support GemFire, and now Geode.  In fact, it
>> > has to be this way for Apache Geode and Pivotal GemFire given the fork
>> in
>> > the codebase the current disparity between sga2 and develop.
>> >
>> > However, this is not to say that the prior release branches will be
>> > maintained indefinitely.  In fact, they are only maintained back to a
>> > certain release of GemFire (currently, 7.0.2). So it looks a little
>> > something like this...
>> >
>> > SDG 1.4.x -> GemFire 7.0.2 (currently SDG 1.4.6, or SR6)
>> > SDG 1.5.x -> GemFire 7.0.2 (currently SDG 1.5.3, or SR3)
>> > SDG 1.6.x -> GemFire 8.0.0 (currently SDG 1.6.1, or SR1)
>> > SDG 1.7.x -> GemFire 8.1.0 (currently SDG 1.7 M1, or Milestone 1).
>> >
>> > So, yes, that means I actively maintain 3-4 different versions of SDG,
>> > though SDG 1.4.x has reached it's EOL, and soon too will the SDG 1.

Re: Where to place "Spark + GemFire" connector.

2015-07-07 Thread Bruce Schuchardt

+1

Le 7/7/2015 3:58 PM, John Blum a écrit :

There are a few Spring projects that are exemplary (examples) in their
modularity, contained within a single repo.  The core Spring Framework and
Spring Boot are 2 such projects that immediately come to mind.

However, this sort of disciplined modularity requires a very important
delineation of responsibilities / separation of concerns reflected in the
organization (and cleanliness) of the codebase combined with a very
well-understood set of principles and practices to ensure this level of
modularity is maintained; Geode is none of these things at the moment.  I
echo Kirk's early concerns about build times and testing, etc, not to
mention the gravity surrounding what is pertinent and what is not in order
to contribute to the "core" of Geode.

-j

On Tue, Jul 7, 2015 at 2:55 PM, William Markito  wrote:


Folks,

There is a lot of good and valuable points on this thread, however we need
to discuss some practical actions here and maybe even see what other
projects have already done during their incubation.

For example, Apache Zeppelin (incubating) is also dependent on Spark and
what they do is select which version of Spark you're going to build
against.

https://github.com/apache/incubator-zeppelin/tree/master

Given that we don't yet have a single release, I'm not sure we should
already be that concerned about having and maintaining multiple sub
repositories or even sub projects.

That said, it doesn't mean we shouldn't be concerned about modularization,
it's just that we don't actually need yet to have releases of each
independent modules trying to catch-up with other projects release cycle
without having a release cycle for our project yet.

IHMO, whenever a Geode release happens we may need to decide if it's going
to support an specific Spark version or be built and support multiple
versions (like Zeppelin) -  That also may apply to HDFS since we also
support HDFS but we don't have HDFS integration as a separate repository
just in order to catch up with HDFS release cycle. It doesn't mean it
shouldn't be modularized and developed as a separate project under Geode.

IOW, I'd vote to keep everything in the same repository, as different
projects, with modularized code and dependencies and when time comes, after
a couple releases, if it makes sense, sure, break it in different
repositories or sub-projects that may grow by themselves.

To be honest, I'm not exactly sure why the modularization discussion has to
be so tied with repositories.  Spring or any other DI framework for example
allows you to write nice and decoupled code... not mention other
techniques...

My 0.0.2 cents (following semantic versioning :) )

~/William

On Tue, Jul 7, 2015 at 1:33 PM, John Blum  wrote:


Just a quick word on maintaining different (release) branches for main
dependencies (.e.g. "driver" dependencies).  Again, this is exactly what
Spring Data GemFire does to support GemFire, and now Geode.  In fact, it
has to be this way for Apache Geode and Pivotal GemFire given the fork in
the codebase the current disparity between sga2 and develop.

However, this is not to say that the prior release branches will be
maintained indefinitely.  In fact, they are only maintained back to a
certain release of GemFire (currently, 7.0.2). So it looks a little
something like this...

SDG 1.4.x -> GemFire 7.0.2 (currently SDG 1.4.6, or SR6)
SDG 1.5.x -> GemFire 7.0.2 (currently SDG 1.5.3, or SR3)
SDG 1.6.x -> GemFire 8.0.0 (currently SDG 1.6.1, or SR1)
SDG 1.7.x -> GemFire 8.1.0 (currently SDG 1.7 M1, or Milestone 1).

So, yes, that means I actively maintain 3-4 different versions of SDG,
though SDG 1.4.x has reached it's EOL, and soon too will the SDG 1.5.x
line.

See *Spring Data GemFire *project page
 [0] for further

details.

You can also see the *Spring Data GemFire* GitHub project
 [1]

for

release branches and tags as well.

-j

[0] - http://projects.spring.io/spring-data-gemfire/
[1] - https://github.com/spring-projects/spring-data-gemfire/releases


On Tue, Jul 7, 2015 at 1:16 PM, Dan Smith  wrote:


To support different versions of spark, wouldn't it be better to have a
single code base that has adapters for different versions of spark? It
seems like that would be better than maintaining several active

branches

with semi-duplicate code.

I do think it would be better to keep the geode spark connector in a
separate repository with a separate release cycle, for all of the

reasons

outlined on this thread (don't bloat the geode codebase, modularity,

etc.).

But I think there is also value in keeping it in the apache community

and

managing it through the apache process. I'm not sure how "just put it

on

github" would work out. Maybe it's just a matter of making it through

the

pain of the restrictive incubation process until we can split this code
out. And in the mean time 

Re: Where to place "Spark + GemFire" connector.

2015-07-07 Thread John Blum
There are a few Spring projects that are exemplary (examples) in their
modularity, contained within a single repo.  The core Spring Framework and
Spring Boot are 2 such projects that immediately come to mind.

However, this sort of disciplined modularity requires a very important
delineation of responsibilities / separation of concerns reflected in the
organization (and cleanliness) of the codebase combined with a very
well-understood set of principles and practices to ensure this level of
modularity is maintained; Geode is none of these things at the moment.  I
echo Kirk's early concerns about build times and testing, etc, not to
mention the gravity surrounding what is pertinent and what is not in order
to contribute to the "core" of Geode.

-j

On Tue, Jul 7, 2015 at 2:55 PM, William Markito  wrote:

> Folks,
>
> There is a lot of good and valuable points on this thread, however we need
> to discuss some practical actions here and maybe even see what other
> projects have already done during their incubation.
>
> For example, Apache Zeppelin (incubating) is also dependent on Spark and
> what they do is select which version of Spark you're going to build
> against.
>
> https://github.com/apache/incubator-zeppelin/tree/master
>
> Given that we don't yet have a single release, I'm not sure we should
> already be that concerned about having and maintaining multiple sub
> repositories or even sub projects.
>
> That said, it doesn't mean we shouldn't be concerned about modularization,
> it's just that we don't actually need yet to have releases of each
> independent modules trying to catch-up with other projects release cycle
> without having a release cycle for our project yet.
>
> IHMO, whenever a Geode release happens we may need to decide if it's going
> to support an specific Spark version or be built and support multiple
> versions (like Zeppelin) -  That also may apply to HDFS since we also
> support HDFS but we don't have HDFS integration as a separate repository
> just in order to catch up with HDFS release cycle. It doesn't mean it
> shouldn't be modularized and developed as a separate project under Geode.
>
> IOW, I'd vote to keep everything in the same repository, as different
> projects, with modularized code and dependencies and when time comes, after
> a couple releases, if it makes sense, sure, break it in different
> repositories or sub-projects that may grow by themselves.
>
> To be honest, I'm not exactly sure why the modularization discussion has to
> be so tied with repositories.  Spring or any other DI framework for example
> allows you to write nice and decoupled code... not mention other
> techniques...
>
> My 0.0.2 cents (following semantic versioning :) )
>
> ~/William
>
> On Tue, Jul 7, 2015 at 1:33 PM, John Blum  wrote:
>
> > Just a quick word on maintaining different (release) branches for main
> > dependencies (.e.g. "driver" dependencies).  Again, this is exactly what
> > Spring Data GemFire does to support GemFire, and now Geode.  In fact, it
> > has to be this way for Apache Geode and Pivotal GemFire given the fork in
> > the codebase the current disparity between sga2 and develop.
> >
> > However, this is not to say that the prior release branches will be
> > maintained indefinitely.  In fact, they are only maintained back to a
> > certain release of GemFire (currently, 7.0.2). So it looks a little
> > something like this...
> >
> > SDG 1.4.x -> GemFire 7.0.2 (currently SDG 1.4.6, or SR6)
> > SDG 1.5.x -> GemFire 7.0.2 (currently SDG 1.5.3, or SR3)
> > SDG 1.6.x -> GemFire 8.0.0 (currently SDG 1.6.1, or SR1)
> > SDG 1.7.x -> GemFire 8.1.0 (currently SDG 1.7 M1, or Milestone 1).
> >
> > So, yes, that means I actively maintain 3-4 different versions of SDG,
> > though SDG 1.4.x has reached it's EOL, and soon too will the SDG 1.5.x
> > line.
> >
> > See *Spring Data GemFire *project page
> >  [0] for further
> details.
> >
> > You can also see the *Spring Data GemFire* GitHub project
> >  [1]
> for
> > release branches and tags as well.
> >
> > -j
> >
> > [0] - http://projects.spring.io/spring-data-gemfire/
> > [1] - https://github.com/spring-projects/spring-data-gemfire/releases
> >
> >
> > On Tue, Jul 7, 2015 at 1:16 PM, Dan Smith  wrote:
> >
> > > To support different versions of spark, wouldn't it be better to have a
> > > single code base that has adapters for different versions of spark? It
> > > seems like that would be better than maintaining several active
> branches
> > > with semi-duplicate code.
> > >
> > > I do think it would be better to keep the geode spark connector in a
> > > separate repository with a separate release cycle, for all of the
> reasons
> > > outlined on this thread (don't bloat the geode codebase, modularity,
> > etc.).
> > > But I think there is also value in keeping it in the apache community
> and
> > > managing it through the apache 

Re: Where to place "Spark + GemFire" connector.

2015-07-07 Thread William Markito
Folks,

There is a lot of good and valuable points on this thread, however we need
to discuss some practical actions here and maybe even see what other
projects have already done during their incubation.

For example, Apache Zeppelin (incubating) is also dependent on Spark and
what they do is select which version of Spark you're going to build against.

https://github.com/apache/incubator-zeppelin/tree/master

Given that we don't yet have a single release, I'm not sure we should
already be that concerned about having and maintaining multiple sub
repositories or even sub projects.

That said, it doesn't mean we shouldn't be concerned about modularization,
it's just that we don't actually need yet to have releases of each
independent modules trying to catch-up with other projects release cycle
without having a release cycle for our project yet.

IHMO, whenever a Geode release happens we may need to decide if it's going
to support an specific Spark version or be built and support multiple
versions (like Zeppelin) -  That also may apply to HDFS since we also
support HDFS but we don't have HDFS integration as a separate repository
just in order to catch up with HDFS release cycle. It doesn't mean it
shouldn't be modularized and developed as a separate project under Geode.

IOW, I'd vote to keep everything in the same repository, as different
projects, with modularized code and dependencies and when time comes, after
a couple releases, if it makes sense, sure, break it in different
repositories or sub-projects that may grow by themselves.

To be honest, I'm not exactly sure why the modularization discussion has to
be so tied with repositories.  Spring or any other DI framework for example
allows you to write nice and decoupled code... not mention other
techniques...

My 0.0.2 cents (following semantic versioning :) )

~/William

On Tue, Jul 7, 2015 at 1:33 PM, John Blum  wrote:

> Just a quick word on maintaining different (release) branches for main
> dependencies (.e.g. "driver" dependencies).  Again, this is exactly what
> Spring Data GemFire does to support GemFire, and now Geode.  In fact, it
> has to be this way for Apache Geode and Pivotal GemFire given the fork in
> the codebase the current disparity between sga2 and develop.
>
> However, this is not to say that the prior release branches will be
> maintained indefinitely.  In fact, they are only maintained back to a
> certain release of GemFire (currently, 7.0.2). So it looks a little
> something like this...
>
> SDG 1.4.x -> GemFire 7.0.2 (currently SDG 1.4.6, or SR6)
> SDG 1.5.x -> GemFire 7.0.2 (currently SDG 1.5.3, or SR3)
> SDG 1.6.x -> GemFire 8.0.0 (currently SDG 1.6.1, or SR1)
> SDG 1.7.x -> GemFire 8.1.0 (currently SDG 1.7 M1, or Milestone 1).
>
> So, yes, that means I actively maintain 3-4 different versions of SDG,
> though SDG 1.4.x has reached it's EOL, and soon too will the SDG 1.5.x
> line.
>
> See *Spring Data GemFire *project page
>  [0] for further details.
>
> You can also see the *Spring Data GemFire* GitHub project
>  [1] for
> release branches and tags as well.
>
> -j
>
> [0] - http://projects.spring.io/spring-data-gemfire/
> [1] - https://github.com/spring-projects/spring-data-gemfire/releases
>
>
> On Tue, Jul 7, 2015 at 1:16 PM, Dan Smith  wrote:
>
> > To support different versions of spark, wouldn't it be better to have a
> > single code base that has adapters for different versions of spark? It
> > seems like that would be better than maintaining several active branches
> > with semi-duplicate code.
> >
> > I do think it would be better to keep the geode spark connector in a
> > separate repository with a separate release cycle, for all of the reasons
> > outlined on this thread (don't bloat the geode codebase, modularity,
> etc.).
> > But I think there is also value in keeping it in the apache community and
> > managing it through the apache process. I'm not sure how "just put it on
> > github" would work out. Maybe it's just a matter of making it through the
> > pain of the restrictive incubation process until we can split this code
> > out. And in the mean time keeping it as loosely coupled as possible.
> >
> > -Dan
> >
> > On Tue, Jul 7, 2015 at 11:57 AM, John Blum  wrote:
> >
> > > +1 - Bingo, that tis the question.
> > >
> > > Part of the answer lies in having planned, predictable and a consistent
> > > cadence of releases.
> > >
> > > E.g. the *Spring Data* project  >
> > > [0]
> > > is an umbrella project managing 12 individual modules (e.g. SD... JPA,
> > > Mongo, Redis, Neo4j, GemFire, Cassandra, etc, dubbed the "release
> train")
> > > which all are at different versions and all have different external,
> > > critical (driver) dependencies.  The only dependen(cy|cies) all SD
> > modules
> > > have in common is the version of the core *Spring Framework* and the
> > 

Re: Where to place "Spark + GemFire" connector.

2015-07-07 Thread John Blum
Just a quick word on maintaining different (release) branches for main
dependencies (.e.g. "driver" dependencies).  Again, this is exactly what
Spring Data GemFire does to support GemFire, and now Geode.  In fact, it
has to be this way for Apache Geode and Pivotal GemFire given the fork in
the codebase the current disparity between sga2 and develop.

However, this is not to say that the prior release branches will be
maintained indefinitely.  In fact, they are only maintained back to a
certain release of GemFire (currently, 7.0.2). So it looks a little
something like this...

SDG 1.4.x -> GemFire 7.0.2 (currently SDG 1.4.6, or SR6)
SDG 1.5.x -> GemFire 7.0.2 (currently SDG 1.5.3, or SR3)
SDG 1.6.x -> GemFire 8.0.0 (currently SDG 1.6.1, or SR1)
SDG 1.7.x -> GemFire 8.1.0 (currently SDG 1.7 M1, or Milestone 1).

So, yes, that means I actively maintain 3-4 different versions of SDG,
though SDG 1.4.x has reached it's EOL, and soon too will the SDG 1.5.x line.

See *Spring Data GemFire *project page
 [0] for further details.

You can also see the *Spring Data GemFire* GitHub project
 [1] for
release branches and tags as well.

-j

[0] - http://projects.spring.io/spring-data-gemfire/
[1] - https://github.com/spring-projects/spring-data-gemfire/releases


On Tue, Jul 7, 2015 at 1:16 PM, Dan Smith  wrote:

> To support different versions of spark, wouldn't it be better to have a
> single code base that has adapters for different versions of spark? It
> seems like that would be better than maintaining several active branches
> with semi-duplicate code.
>
> I do think it would be better to keep the geode spark connector in a
> separate repository with a separate release cycle, for all of the reasons
> outlined on this thread (don't bloat the geode codebase, modularity, etc.).
> But I think there is also value in keeping it in the apache community and
> managing it through the apache process. I'm not sure how "just put it on
> github" would work out. Maybe it's just a matter of making it through the
> pain of the restrictive incubation process until we can split this code
> out. And in the mean time keeping it as loosely coupled as possible.
>
> -Dan
>
> On Tue, Jul 7, 2015 at 11:57 AM, John Blum  wrote:
>
> > +1 - Bingo, that tis the question.
> >
> > Part of the answer lies in having planned, predictable and a consistent
> > cadence of releases.
> >
> > E.g. the *Spring Data* project 
> > [0]
> > is an umbrella project managing 12 individual modules (e.g. SD... JPA,
> > Mongo, Redis, Neo4j, GemFire, Cassandra, etc, dubbed the "release train")
> > which all are at different versions and all have different external,
> > critical (driver) dependencies.  The only dependen(cy|cies) all SD
> modules
> > have in common is the version of the core *Spring Framework* and the
> > version of Spring* Data Commons*.  Otherwise individual modules upgrade
> > their "driver" dependencies at different cycles, possibly in different
> > "release train", but only when the current release train is released
> > (~every 4 weeks).  See SD Wiki
> > <
> >
> https://github.com/spring-projects/spring-data-commons/wiki/Release-planning
> > >
> > [1]
> > for more details.
> >
> > [0] - http://projects.spring.io/spring-data/
> > [1] -
> >
> >
> https://github.com/spring-projects/spring-data-commons/wiki/Release-planning
> >
> >
> > On Tue, Jul 7, 2015 at 11:21 AM, Gregory Chase 
> wrote:
> >
> > > More important than easy to develop is easy to pick up and use.
> > >
> > > Improving the new user experience is something that needs attention
> from
> > > Geode.  How we develop and provide Spark integration needs to take this
> > > into account.
> > >
> > > Once we are able to provide official releases, how can a user know and
> > make
> > > sure they are getting the correct plug-in version, and have relatively
> up
> > > to date support for latest Geode and Spark versions?
> > >
> > > That to me is the requirement we should be designing for first in our
> > > development process.
> > >
> > > On Tue, Jul 7, 2015 at 10:47 AM, Roman Shaposhnik <
> ro...@shaposhnik.org>
> > > wrote:
> > >
> > > > On Tue, Jul 7, 2015 at 10:34 AM, Anilkumar Gingade <
> > aging...@pivotal.io>
> > > > wrote:
> > > > > Agree...And thats the point...The connector code needs to catch up
> > with
> > > > > spark release train; if its part of Geode then the Geode releases
> > needs
> > > > to
> > > > > happen as often as Spark release (along with other planned Geode
> > > > release)...
> > > >
> > > > I don't think this is a realistic goal to have that many actively
> > > > supported branches
> > > > of Geode Spark connector.
> > > >
> > > > Look, I've been around Hadoop ecosystem for years. Nowhere the
> problem
> > of
> > > > integration with upstream is as present as in Hadoop ecosystem
> > > > (everything depends
> > > > on 

Re: Where to place "Spark + GemFire" connector.

2015-07-07 Thread Dan Smith
To support different versions of spark, wouldn't it be better to have a
single code base that has adapters for different versions of spark? It
seems like that would be better than maintaining several active branches
with semi-duplicate code.

I do think it would be better to keep the geode spark connector in a
separate repository with a separate release cycle, for all of the reasons
outlined on this thread (don't bloat the geode codebase, modularity, etc.).
But I think there is also value in keeping it in the apache community and
managing it through the apache process. I'm not sure how "just put it on
github" would work out. Maybe it's just a matter of making it through the
pain of the restrictive incubation process until we can split this code
out. And in the mean time keeping it as loosely coupled as possible.

-Dan

On Tue, Jul 7, 2015 at 11:57 AM, John Blum  wrote:

> +1 - Bingo, that tis the question.
>
> Part of the answer lies in having planned, predictable and a consistent
> cadence of releases.
>
> E.g. the *Spring Data* project 
> [0]
> is an umbrella project managing 12 individual modules (e.g. SD... JPA,
> Mongo, Redis, Neo4j, GemFire, Cassandra, etc, dubbed the "release train")
> which all are at different versions and all have different external,
> critical (driver) dependencies.  The only dependen(cy|cies) all SD modules
> have in common is the version of the core *Spring Framework* and the
> version of Spring* Data Commons*.  Otherwise individual modules upgrade
> their "driver" dependencies at different cycles, possibly in different
> "release train", but only when the current release train is released
> (~every 4 weeks).  See SD Wiki
> <
> https://github.com/spring-projects/spring-data-commons/wiki/Release-planning
> >
> [1]
> for more details.
>
> [0] - http://projects.spring.io/spring-data/
> [1] -
>
> https://github.com/spring-projects/spring-data-commons/wiki/Release-planning
>
>
> On Tue, Jul 7, 2015 at 11:21 AM, Gregory Chase  wrote:
>
> > More important than easy to develop is easy to pick up and use.
> >
> > Improving the new user experience is something that needs attention from
> > Geode.  How we develop and provide Spark integration needs to take this
> > into account.
> >
> > Once we are able to provide official releases, how can a user know and
> make
> > sure they are getting the correct plug-in version, and have relatively up
> > to date support for latest Geode and Spark versions?
> >
> > That to me is the requirement we should be designing for first in our
> > development process.
> >
> > On Tue, Jul 7, 2015 at 10:47 AM, Roman Shaposhnik 
> > wrote:
> >
> > > On Tue, Jul 7, 2015 at 10:34 AM, Anilkumar Gingade <
> aging...@pivotal.io>
> > > wrote:
> > > > Agree...And thats the point...The connector code needs to catch up
> with
> > > > spark release train; if its part of Geode then the Geode releases
> needs
> > > to
> > > > happen as often as Spark release (along with other planned Geode
> > > release)...
> > >
> > > I don't think this is a realistic goal to have that many actively
> > > supported branches
> > > of Geode Spark connector.
> > >
> > > Look, I've been around Hadoop ecosystem for years. Nowhere the problem
> of
> > > integration with upstream is as present as in Hadoop ecosystem
> > > (everything depends
> > > on everything else and everything evolves like crazy). I haven't seen a
> > > single
> > > project in that ecosystem that would be able to support a blanket
> > statement
> > > like the above. May be Geode has resources that guys depending on
> > something
> > > like HBase simply don't have.
> > >
> > > Thanks,
> > > Roman.
> > >
> >
> >
> >
> > --
> > Greg Chase
> >
> > Director of Big Data Communities
> > http://www.pivotal.io/big-data
> >
> > Pivotal Software
> > http://www.pivotal.io/
> >
> > 650-215-0477
> > @GregChase
> > Blog: http://geekmarketing.biz/
> >
>
>
>
> --
> -John
> 503-504-8657
> john.blum10101 (skype)
>


Re: Where to place "Spark + GemFire" connector.

2015-07-07 Thread John Blum
+1 - Bingo, that tis the question.

Part of the answer lies in having planned, predictable and a consistent
cadence of releases.

E.g. the *Spring Data* project  [0]
is an umbrella project managing 12 individual modules (e.g. SD... JPA,
Mongo, Redis, Neo4j, GemFire, Cassandra, etc, dubbed the "release train")
which all are at different versions and all have different external,
critical (driver) dependencies.  The only dependen(cy|cies) all SD modules
have in common is the version of the core *Spring Framework* and the
version of Spring* Data Commons*.  Otherwise individual modules upgrade
their "driver" dependencies at different cycles, possibly in different
"release train", but only when the current release train is released
(~every 4 weeks).  See SD Wiki

[1]
for more details.

[0] - http://projects.spring.io/spring-data/
[1] -
https://github.com/spring-projects/spring-data-commons/wiki/Release-planning


On Tue, Jul 7, 2015 at 11:21 AM, Gregory Chase  wrote:

> More important than easy to develop is easy to pick up and use.
>
> Improving the new user experience is something that needs attention from
> Geode.  How we develop and provide Spark integration needs to take this
> into account.
>
> Once we are able to provide official releases, how can a user know and make
> sure they are getting the correct plug-in version, and have relatively up
> to date support for latest Geode and Spark versions?
>
> That to me is the requirement we should be designing for first in our
> development process.
>
> On Tue, Jul 7, 2015 at 10:47 AM, Roman Shaposhnik 
> wrote:
>
> > On Tue, Jul 7, 2015 at 10:34 AM, Anilkumar Gingade 
> > wrote:
> > > Agree...And thats the point...The connector code needs to catch up with
> > > spark release train; if its part of Geode then the Geode releases needs
> > to
> > > happen as often as Spark release (along with other planned Geode
> > release)...
> >
> > I don't think this is a realistic goal to have that many actively
> > supported branches
> > of Geode Spark connector.
> >
> > Look, I've been around Hadoop ecosystem for years. Nowhere the problem of
> > integration with upstream is as present as in Hadoop ecosystem
> > (everything depends
> > on everything else and everything evolves like crazy). I haven't seen a
> > single
> > project in that ecosystem that would be able to support a blanket
> statement
> > like the above. May be Geode has resources that guys depending on
> something
> > like HBase simply don't have.
> >
> > Thanks,
> > Roman.
> >
>
>
>
> --
> Greg Chase
>
> Director of Big Data Communities
> http://www.pivotal.io/big-data
>
> Pivotal Software
> http://www.pivotal.io/
>
> 650-215-0477
> @GregChase
> Blog: http://geekmarketing.biz/
>



-- 
-John
503-504-8657
john.blum10101 (skype)


Re: Where to place "Spark + GemFire" connector.

2015-07-07 Thread Jianxia Chen
I agree that Spark Geode connector has its own repo.

In fact, in order to use Spark Geode Connector, the users write Spark
application (instead of  Geode application) that calls the Spark Geode
Connector APIs.

There are a bunch of similar Spark connector projects which connect Spark
with other data store (e.g Cassandra, HBase). Each of these projects has
its own independent repo, instead of living in the same repo as the data
store it supports. Please take a look at http://spark-packages.org for more
details.

I don't think having Spark Geode Connector in its own repo will make Geode
release difficult. On the contrary, it will be easier. Because then Geode
release doesn't have to worry about Spark Geode Connector.


On Tue, Jul 7, 2015 at 10:35 AM, Jason Huynh  wrote:

> I agree with the github approach as the Spark connector was originally
> designed to be in it's own repo with dependencies on the Spark and Geode
> jars.  I think the backwards compatibility for the Spark versions would be
> as John described, based on the sbt dependencies file.
>
> If we go with the single repo approach, as everyone has stated, we would
> want to have more releases for Geode, which would mean Geode would have at
> least as many releases as Spark.
>
>
> On Tue, Jul 7, 2015 at 10:21 AM, Kirk Lund  wrote:
>
> > The recommended ideal time for building and executing all unit tests for
> a
> > project is 10 minutes.[0][1][2]
> >
> > "Builds should be fast. Anything beyond 10 minutes becomes a dysfunction
> in
> > the process, because people won’t commit as frequently. Large builds can
> be
> > broken into multiple jobs and executed in parallel."[3]
> >
> > Now imagine packing 6 projects together into 1 project. Assuming all 6
> have
> > very fast unit tests that use Mockito then each takes 10 minutes to run
> and
> > you end up with 60 minutes for building the overall project.
> >
> > This is then heading in the opposite direction from where Geode needs to
> > go. If Geode continues to execute distributedTest and integrationTest
> from
> > the main build target then this drives it up even longer. I'd recommend
> > considering every option to reduce build time including moving
> independent
> > tools to other repos.
> >
> > I think it's more likely that other contributors will join in on the
> Spark
> > connector or JVSD or even Geode if they are isolated in their own
> projects.
> >
> > But, if group consensus is to keep everything in 1 project, then let's at
> > least talk seriously about committing to breaking up tests into multiple
> > jobs for parallel execution.
> >
> > [0] http://www.jamesshore.com/Agile-Book/ten_minute_build.html
> > [1]
> >
> >
> http://www.martinfowler.com/articles/continuousIntegration.html#KeepTheBuildFast
> > [2]
> > http://www.energizedwork.com/weblog/2006/02/ten-minute-build-continuous
> > [3]
> >
> >
> http://blogs.collab.net/devopsci/ten-best-practices-for-continuous-integration
> >
> > On Tue, Jul 7, 2015 at 9:55 AM, Anthony Baker  wrote:
> >
> > > Given the rate of change, it doesn’t seem like we should be trying to
> add
> > > (and maintain) support for every single Spark release.  We’re early in
> > the
> > > lifecycle of the Spark connector and too much emphasis on
> > > backwards-compatibility will be a drag on our ongoing development,
> > > particularly since the Spark community is valuing rapid evolution over
> > > stability.
> > >
> > > (apologies if I have misconstrued the state of Spark)
> > >
> > > Anthony
> > >
> > >
> > > > On Jul 6, 2015, at 11:22 PM, Qihong Chen  wrote:
> > > >
> > > > The problem is caused by multiple major dependencies and different
> > > release
> > > > cycles. Spark Geode Connector depends on two products: Spark and
> Geode
> > > (not
> > > > counting other dependencies), and Spark moves much faster than Geode,
> > and
> > > > some features/code are not backward compatible.
> > > >
> > > > Our initial connector implementation depends on Spark 1.2 in before
> the
> > > > last week of March 15. Then Spark 1.3 was released on the last week
> of
> > > > March, and some connector feature doesn't work with Spark 1.3, then
> we
> > > > moved on, and now support Spark 1.3 (but not 1.2 any more, we did
> > create
> > > > tag). Two weeks ago, Spark 1.4 was released, and it breaks our
> > connector
> > > > code again.
> > > >
> > > > Therefore, for each Geode release, we probably need multiple
> Connector
> > > > releases, and probably need to maintain last 2 or 3 Connector
> releases,
> > > for
> > > > example, we need to support both Spark 1.3 and 1.4 with the current
> > Geode
> > > > code.
> > > >
> > > > The question is how to support this with single source repository?
> > > >
> > > > Thanks,
> > > > Qihong
> > >
> > >
> >
>


Re: Where to place "Spark + GemFire" connector.

2015-07-07 Thread Roman Shaposhnik
On Tue, Jul 7, 2015 at 11:21 AM, Gregory Chase  wrote:
> More important than easy to develop is easy to pick up and use.
>
> Improving the new user experience is something that needs attention from
> Geode.  How we develop and provide Spark integration needs to take this
> into account.
>
> Once we are able to provide official releases, how can a user know and make
> sure they are getting the correct plug-in version, and have relatively up
> to date support for latest Geode and Spark versions?
>
> That to me is the requirement we should be designing for first in our
> development process.

Huge +1 to the above. In fact, if you look into an open source project
that is frequently used as a gold standard of mind share capturing: Docker
you'll notice that they credit a strategy of "batteries included, but removable"
as one of the cornerstones of their success:
https://clusterhq.com/2014/12/08/docker-extensions/

Docker hype aside, perhaps there's a lesson to be learned.

Thanks,
Roman.


Re: Where to place "Spark + GemFire" connector.

2015-07-07 Thread Gregory Chase
More important than easy to develop is easy to pick up and use.

Improving the new user experience is something that needs attention from
Geode.  How we develop and provide Spark integration needs to take this
into account.

Once we are able to provide official releases, how can a user know and make
sure they are getting the correct plug-in version, and have relatively up
to date support for latest Geode and Spark versions?

That to me is the requirement we should be designing for first in our
development process.

On Tue, Jul 7, 2015 at 10:47 AM, Roman Shaposhnik 
wrote:

> On Tue, Jul 7, 2015 at 10:34 AM, Anilkumar Gingade 
> wrote:
> > Agree...And thats the point...The connector code needs to catch up with
> > spark release train; if its part of Geode then the Geode releases needs
> to
> > happen as often as Spark release (along with other planned Geode
> release)...
>
> I don't think this is a realistic goal to have that many actively
> supported branches
> of Geode Spark connector.
>
> Look, I've been around Hadoop ecosystem for years. Nowhere the problem of
> integration with upstream is as present as in Hadoop ecosystem
> (everything depends
> on everything else and everything evolves like crazy). I haven't seen a
> single
> project in that ecosystem that would be able to support a blanket statement
> like the above. May be Geode has resources that guys depending on something
> like HBase simply don't have.
>
> Thanks,
> Roman.
>



-- 
Greg Chase

Director of Big Data Communities
http://www.pivotal.io/big-data

Pivotal Software
http://www.pivotal.io/

650-215-0477
@GregChase
Blog: http://geekmarketing.biz/


Re: Where to place "Spark + GemFire" connector.

2015-07-07 Thread Roman Shaposhnik
On Tue, Jul 7, 2015 at 10:34 AM, Anilkumar Gingade  wrote:
> Agree...And thats the point...The connector code needs to catch up with
> spark release train; if its part of Geode then the Geode releases needs to
> happen as often as Spark release (along with other planned Geode release)...

I don't think this is a realistic goal to have that many actively
supported branches
of Geode Spark connector.

Look, I've been around Hadoop ecosystem for years. Nowhere the problem of
integration with upstream is as present as in Hadoop ecosystem
(everything depends
on everything else and everything evolves like crazy). I haven't seen a single
project in that ecosystem that would be able to support a blanket statement
like the above. May be Geode has resources that guys depending on something
like HBase simply don't have.

Thanks,
Roman.


Re: Where to place "Spark + GemFire" connector.

2015-07-07 Thread Jason Huynh
I agree with the github approach as the Spark connector was originally
designed to be in it's own repo with dependencies on the Spark and Geode
jars.  I think the backwards compatibility for the Spark versions would be
as John described, based on the sbt dependencies file.

If we go with the single repo approach, as everyone has stated, we would
want to have more releases for Geode, which would mean Geode would have at
least as many releases as Spark.


On Tue, Jul 7, 2015 at 10:21 AM, Kirk Lund  wrote:

> The recommended ideal time for building and executing all unit tests for a
> project is 10 minutes.[0][1][2]
>
> "Builds should be fast. Anything beyond 10 minutes becomes a dysfunction in
> the process, because people won’t commit as frequently. Large builds can be
> broken into multiple jobs and executed in parallel."[3]
>
> Now imagine packing 6 projects together into 1 project. Assuming all 6 have
> very fast unit tests that use Mockito then each takes 10 minutes to run and
> you end up with 60 minutes for building the overall project.
>
> This is then heading in the opposite direction from where Geode needs to
> go. If Geode continues to execute distributedTest and integrationTest from
> the main build target then this drives it up even longer. I'd recommend
> considering every option to reduce build time including moving independent
> tools to other repos.
>
> I think it's more likely that other contributors will join in on the Spark
> connector or JVSD or even Geode if they are isolated in their own projects.
>
> But, if group consensus is to keep everything in 1 project, then let's at
> least talk seriously about committing to breaking up tests into multiple
> jobs for parallel execution.
>
> [0] http://www.jamesshore.com/Agile-Book/ten_minute_build.html
> [1]
>
> http://www.martinfowler.com/articles/continuousIntegration.html#KeepTheBuildFast
> [2]
> http://www.energizedwork.com/weblog/2006/02/ten-minute-build-continuous
> [3]
>
> http://blogs.collab.net/devopsci/ten-best-practices-for-continuous-integration
>
> On Tue, Jul 7, 2015 at 9:55 AM, Anthony Baker  wrote:
>
> > Given the rate of change, it doesn’t seem like we should be trying to add
> > (and maintain) support for every single Spark release.  We’re early in
> the
> > lifecycle of the Spark connector and too much emphasis on
> > backwards-compatibility will be a drag on our ongoing development,
> > particularly since the Spark community is valuing rapid evolution over
> > stability.
> >
> > (apologies if I have misconstrued the state of Spark)
> >
> > Anthony
> >
> >
> > > On Jul 6, 2015, at 11:22 PM, Qihong Chen  wrote:
> > >
> > > The problem is caused by multiple major dependencies and different
> > release
> > > cycles. Spark Geode Connector depends on two products: Spark and Geode
> > (not
> > > counting other dependencies), and Spark moves much faster than Geode,
> and
> > > some features/code are not backward compatible.
> > >
> > > Our initial connector implementation depends on Spark 1.2 in before the
> > > last week of March 15. Then Spark 1.3 was released on the last week of
> > > March, and some connector feature doesn't work with Spark 1.3, then we
> > > moved on, and now support Spark 1.3 (but not 1.2 any more, we did
> create
> > > tag). Two weeks ago, Spark 1.4 was released, and it breaks our
> connector
> > > code again.
> > >
> > > Therefore, for each Geode release, we probably need multiple Connector
> > > releases, and probably need to maintain last 2 or 3 Connector releases,
> > for
> > > example, we need to support both Spark 1.3 and 1.4 with the current
> Geode
> > > code.
> > >
> > > The question is how to support this with single source repository?
> > >
> > > Thanks,
> > > Qihong
> >
> >
>


Re: Where to place "Spark + GemFire" connector.

2015-07-07 Thread Anilkumar Gingade
Agree...And thats the point...The connector code needs to catch up with
spark release train; if its part of Geode then the Geode releases needs to
happen as often as Spark release (along with other planned Geode release)...

Even if the connector code is compatible with latest Spark; the previous
connector release/version still has to be maintained as end users are still
working with older versions (most of the enterprise customers).

I like to get a closure on this topic/thread; how do we do that...

Thanks,
-Anil.






On Tue, Jul 7, 2015 at 9:55 AM, Anthony Baker  wrote:

> Given the rate of change, it doesn’t seem like we should be trying to add
> (and maintain) support for every single Spark release.  We’re early in the
> lifecycle of the Spark connector and too much emphasis on
> backwards-compatibility will be a drag on our ongoing development,
> particularly since the Spark community is valuing rapid evolution over
> stability.
>
> (apologies if I have misconstrued the state of Spark)
>
> Anthony
>
>
> > On Jul 6, 2015, at 11:22 PM, Qihong Chen  wrote:
> >
> > The problem is caused by multiple major dependencies and different
> release
> > cycles. Spark Geode Connector depends on two products: Spark and Geode
> (not
> > counting other dependencies), and Spark moves much faster than Geode, and
> > some features/code are not backward compatible.
> >
> > Our initial connector implementation depends on Spark 1.2 in before the
> > last week of March 15. Then Spark 1.3 was released on the last week of
> > March, and some connector feature doesn't work with Spark 1.3, then we
> > moved on, and now support Spark 1.3 (but not 1.2 any more, we did create
> > tag). Two weeks ago, Spark 1.4 was released, and it breaks our connector
> > code again.
> >
> > Therefore, for each Geode release, we probably need multiple Connector
> > releases, and probably need to maintain last 2 or 3 Connector releases,
> for
> > example, we need to support both Spark 1.3 and 1.4 with the current Geode
> > code.
> >
> > The question is how to support this with single source repository?
> >
> > Thanks,
> > Qihong
>
>


Re: Where to place "Spark + GemFire" connector.

2015-07-07 Thread Eric Pederson
I would vote to support at least the previous Spark release.  The big
Hadoop distros usually are a version behind in their Spark support.  For
example, we use MapR which, in their latest release (4.1.0), only supports
Spark 1.2.1 and 1.3.1
.

-- Eric

On Tue, Jul 7, 2015 at 12:55 PM, Anthony Baker  wrote:

> Given the rate of change, it doesn’t seem like we should be trying to add
> (and maintain) support for every single Spark release.  We’re early in the
> lifecycle of the Spark connector and too much emphasis on
> backwards-compatibility will be a drag on our ongoing development,
> particularly since the Spark community is valuing rapid evolution over
> stability.
>
> (apologies if I have misconstrued the state of Spark)
>
> Anthony
>
>
> > On Jul 6, 2015, at 11:22 PM, Qihong Chen  wrote:
> >
> > The problem is caused by multiple major dependencies and different
> release
> > cycles. Spark Geode Connector depends on two products: Spark and Geode
> (not
> > counting other dependencies), and Spark moves much faster than Geode, and
> > some features/code are not backward compatible.
> >
> > Our initial connector implementation depends on Spark 1.2 in before the
> > last week of March 15. Then Spark 1.3 was released on the last week of
> > March, and some connector feature doesn't work with Spark 1.3, then we
> > moved on, and now support Spark 1.3 (but not 1.2 any more, we did create
> > tag). Two weeks ago, Spark 1.4 was released, and it breaks our connector
> > code again.
> >
> > Therefore, for each Geode release, we probably need multiple Connector
> > releases, and probably need to maintain last 2 or 3 Connector releases,
> for
> > example, we need to support both Spark 1.3 and 1.4 with the current Geode
> > code.
> >
> > The question is how to support this with single source repository?
> >
> > Thanks,
> > Qihong
>
>


Re: Where to place "Spark + GemFire" connector.

2015-07-07 Thread Kirk Lund
The recommended ideal time for building and executing all unit tests for a
project is 10 minutes.[0][1][2]

"Builds should be fast. Anything beyond 10 minutes becomes a dysfunction in
the process, because people won’t commit as frequently. Large builds can be
broken into multiple jobs and executed in parallel."[3]

Now imagine packing 6 projects together into 1 project. Assuming all 6 have
very fast unit tests that use Mockito then each takes 10 minutes to run and
you end up with 60 minutes for building the overall project.

This is then heading in the opposite direction from where Geode needs to
go. If Geode continues to execute distributedTest and integrationTest from
the main build target then this drives it up even longer. I'd recommend
considering every option to reduce build time including moving independent
tools to other repos.

I think it's more likely that other contributors will join in on the Spark
connector or JVSD or even Geode if they are isolated in their own projects.

But, if group consensus is to keep everything in 1 project, then let's at
least talk seriously about committing to breaking up tests into multiple
jobs for parallel execution.

[0] http://www.jamesshore.com/Agile-Book/ten_minute_build.html
[1]
http://www.martinfowler.com/articles/continuousIntegration.html#KeepTheBuildFast
[2] http://www.energizedwork.com/weblog/2006/02/ten-minute-build-continuous
[3]
http://blogs.collab.net/devopsci/ten-best-practices-for-continuous-integration

On Tue, Jul 7, 2015 at 9:55 AM, Anthony Baker  wrote:

> Given the rate of change, it doesn’t seem like we should be trying to add
> (and maintain) support for every single Spark release.  We’re early in the
> lifecycle of the Spark connector and too much emphasis on
> backwards-compatibility will be a drag on our ongoing development,
> particularly since the Spark community is valuing rapid evolution over
> stability.
>
> (apologies if I have misconstrued the state of Spark)
>
> Anthony
>
>
> > On Jul 6, 2015, at 11:22 PM, Qihong Chen  wrote:
> >
> > The problem is caused by multiple major dependencies and different
> release
> > cycles. Spark Geode Connector depends on two products: Spark and Geode
> (not
> > counting other dependencies), and Spark moves much faster than Geode, and
> > some features/code are not backward compatible.
> >
> > Our initial connector implementation depends on Spark 1.2 in before the
> > last week of March 15. Then Spark 1.3 was released on the last week of
> > March, and some connector feature doesn't work with Spark 1.3, then we
> > moved on, and now support Spark 1.3 (but not 1.2 any more, we did create
> > tag). Two weeks ago, Spark 1.4 was released, and it breaks our connector
> > code again.
> >
> > Therefore, for each Geode release, we probably need multiple Connector
> > releases, and probably need to maintain last 2 or 3 Connector releases,
> for
> > example, we need to support both Spark 1.3 and 1.4 with the current Geode
> > code.
> >
> > The question is how to support this with single source repository?
> >
> > Thanks,
> > Qihong
>
>


Re: Where to place "Spark + GemFire" connector.

2015-07-07 Thread Anthony Baker
Given the rate of change, it doesn’t seem like we should be trying to add (and 
maintain) support for every single Spark release.  We’re early in the lifecycle 
of the Spark connector and too much emphasis on backwards-compatibility will be 
a drag on our ongoing development, particularly since the Spark community is 
valuing rapid evolution over stability.

(apologies if I have misconstrued the state of Spark)

Anthony


> On Jul 6, 2015, at 11:22 PM, Qihong Chen  wrote:
> 
> The problem is caused by multiple major dependencies and different release
> cycles. Spark Geode Connector depends on two products: Spark and Geode (not
> counting other dependencies), and Spark moves much faster than Geode, and
> some features/code are not backward compatible.
> 
> Our initial connector implementation depends on Spark 1.2 in before the
> last week of March 15. Then Spark 1.3 was released on the last week of
> March, and some connector feature doesn't work with Spark 1.3, then we
> moved on, and now support Spark 1.3 (but not 1.2 any more, we did create
> tag). Two weeks ago, Spark 1.4 was released, and it breaks our connector
> code again.
> 
> Therefore, for each Geode release, we probably need multiple Connector
> releases, and probably need to maintain last 2 or 3 Connector releases, for
> example, we need to support both Spark 1.3 and 1.4 with the current Geode
> code.
> 
> The question is how to support this with single source repository?
> 
> Thanks,
> Qihong



Re: Where to place "Spark + GemFire" connector.

2015-07-07 Thread John Blum
> *for each Geode release, we probably need multiple Connector releases,
and probably need to maintain last 2 or 3 Connector releases, for example,
we need to support both Spark1.3 and 1.4 with the current Geode code.*

Exactly my point for maintaining the GemFire/Geode Spark Connector as a
separate, individually releasable artifact with it's own repo.

*Spring Data GemFire* is no different and you would be *mistaken* if you
think *GemFire/Geode* is the only dependency *Spring Data GemFire* has.  In
fact, SDG depends on the core *Spring Framework* as well as *Spring Data
Commons* from the *Spring* ecosystem as well as many other 3rd party
dependencies, even some that overlap with GemFire/Geode (e.g. Jackson).

So what I "mean" when I say, "**How do you know* which version of Geode, or
any other dependency (e.g. Apache Spark) for that matter, the GemFire/Geode
Spark Connector depends on*"... you need to look at the Maven POM or the
Gradle dependency declarations (e.g. SDG gradle.properties

[0]).
The "declared" dependency of Geode is the version upon which a user is able
to run the connector, that version and that version alone unless the
sub-project states otherwise (.e.g. Geode version 1.0 to 1.5 supported,
Spark 1.2 and 1.3 supported).

If you try to bundle all the various modules (Spark Connector, Redis,
Memcached, etc) you are going to have a dependency management nightmare and
you will stifle any individual module's (.e.g. Spark Connectors) ability to
support newer versions of important dependencies (i.e. Spark) beyond Geode,
because the module will be tied Geode's release cycle, where now the module
will only be able to be released when Geode is ready.

My $0.02.

Thanks,
John



[0] -
https://github.com/spring-projects/spring-data-gemfire/blob/master/gradle.properties


On Mon, Jul 6, 2015 at 11:22 PM, Qihong Chen  wrote:

> The problem is caused by multiple major dependencies and different release
> cycles. Spark Geode Connector depends on two products: Spark and Geode (not
> counting other dependencies), and Spark moves much faster than Geode, and
> some features/code are not backward compatible.
>
> Our initial connector implementation depends on Spark 1.2 in before the
> last week of March 15. Then Spark 1.3 was released on the last week of
> March, and some connector feature doesn't work with Spark 1.3, then we
> moved on, and now support Spark 1.3 (but not 1.2 any more, we did create
> tag). Two weeks ago, Spark 1.4 was released, and it breaks our connector
> code again.
>
> Therefore, for each Geode release, we probably need multiple Connector
> releases, and probably need to maintain last 2 or 3 Connector releases, for
> example, we need to support both Spark 1.3 and 1.4 with the current Geode
> code.
>
> The question is how to support this with single source repository?
>
> Thanks,
> Qihong
>



-- 
-John
503-504-8657
john.blum10101 (skype)


Re: Where to place "Spark + GemFire" connector.

2015-07-07 Thread Kirk Lund
I would think that github would be a better option for the Spark Geode
Connector. That way it's not tightly coupled to the Geode release cycle.

I don't see why it's desirable to bloat Geode with every single script,
tool, or connector that might interact with Geode.

Another reason to consider separating projects is testing. Do we really
want the Geode build to run unit/integration/e2e tests for Geode, JVSD,
Spark connector, etc? If someone isn't working on the Spark connector,
should they be forced to execute the tests for it before committing? I
think the Geode tests alone are going to push the limits of what ASF allows
in Jenkins.

-Kirk


On Mon, Jul 6, 2015 at 11:22 PM, Qihong Chen  wrote:

> The problem is caused by multiple major dependencies and different release
> cycles. Spark Geode Connector depends on two products: Spark and Geode (not
> counting other dependencies), and Spark moves much faster than Geode, and
> some features/code are not backward compatible.
>
> Our initial connector implementation depends on Spark 1.2 in before the
> last week of March 15. Then Spark 1.3 was released on the last week of
> March, and some connector feature doesn't work with Spark 1.3, then we
> moved on, and now support Spark 1.3 (but not 1.2 any more, we did create
> tag). Two weeks ago, Spark 1.4 was released, and it breaks our connector
> code again.
>
> Therefore, for each Geode release, we probably need multiple Connector
> releases, and probably need to maintain last 2 or 3 Connector releases, for
> example, we need to support both Spark 1.3 and 1.4 with the current Geode
> code.
>
> The question is how to support this with single source repository?
>
> Thanks,
> Qihong
>


Re: Where to place "Spark + GemFire" connector.

2015-07-06 Thread Qihong Chen
The problem is caused by multiple major dependencies and different release
cycles. Spark Geode Connector depends on two products: Spark and Geode (not
counting other dependencies), and Spark moves much faster than Geode, and
some features/code are not backward compatible.

Our initial connector implementation depends on Spark 1.2 in before the
last week of March 15. Then Spark 1.3 was released on the last week of
March, and some connector feature doesn't work with Spark 1.3, then we
moved on, and now support Spark 1.3 (but not 1.2 any more, we did create
tag). Two weeks ago, Spark 1.4 was released, and it breaks our connector
code again.

Therefore, for each Geode release, we probably need multiple Connector
releases, and probably need to maintain last 2 or 3 Connector releases, for
example, we need to support both Spark 1.3 and 1.4 with the current Geode
code.

The question is how to support this with single source repository?

Thanks,
Qihong


Re: Where to place "Spark + GemFire" connector.

2015-07-06 Thread Roman Shaposhnik
On Mon, Jul 6, 2015 at 3:10 PM, John Blum  wrote:
>> if you unbundle your spark connector from Geode releases how do you know
> that a given Geode release actually works with it?
>
> Because the Spark Connector will depend on (i.e. have been developed and
> tested with) a specific version of Apache Geode, and is not guaranteed to
> work with downstream releases after that version, nor necessarily before
> that version, though, generally, backwards compatibility is always
> preferable.

That's my very point actually ;-)

Thanks,
Roman.


Re: Where to place "Spark + GemFire" connector.

2015-07-06 Thread John Blum
> if you unbundle your spark connector from Geode releases how do you know
that a given Geode release actually works with it?

Because the Spark Connector will depend on (i.e. have been developed and
tested with) a specific version of Apache Geode, and is not guaranteed to
work with downstream releases after that version, nor necessarily before
that version, though, generally, backwards compatibility is always
preferable.

The Spring IO platform is a prime example of a curated set of tested and
guaranteed to work dependencies within and across the entire Spring
ecosystem including all 3rd party dependencies (some 300+ in total).

This is exactly the relationship

[0]
between Spring Data GemFire and Apache Geode right now.

-j

[0] -
https://github.com/spring-projects/spring-data-gemfire/blob/apache-geode/gradle.properties#L3


On Mon, Jul 6, 2015 at 2:58 PM, Roman Shaposhnik 
wrote:

> On Thu, Jul 2, 2015 at 5:39 PM, Anthony Baker  wrote:
> >>
> >> We are wondering wether to have this as part of Geode repo or on
> separate
> >> public GitHub repo?
> >>
> >
> > I think the spark connector belongs in the geode community, which
> implies the geode ASF repo.
> > I think we can address the other concerns technically.
>
> I am very much +1 on this. The biggest hurdle Geode community has in front
> of it is having its first ASF incubating release. Having one artifact to
> worry
> about would be way simpler.
>
> Later on, you may entertain a question of having an independent
> subproject, but
> that's later.
>
> >> General Question:
> >> Can a module under Geode repo be released independent of Geode Release.
> >> E.g.: Can we release the connector without tied to Geode release?
> >
> > This is an interesting question I don’t know the answer to.  However, I
> think we can
> > handle this by creating a geode release frequently enough to satisfy our
> community.
>
> Huge +1 to the above. Release early, release often and validate
> strongly. The later
> point is actually very much relevant to this discussion: if you
> unbundle your spark
> connector from Geode releases how do you know that a given Geode
> release actually
> works with it? I hope you test. But once you do -- you may as well release
> it.
>
> Thanks,
> Roman.
>



-- 
-John
503-504-8657
john.blum10101 (skype)


Re: Where to place "Spark + GemFire" connector.

2015-07-06 Thread Roman Shaposhnik
// top posting since I'm basically agreeing with your point ;-)

you're absolutely right -- that level of modularity is my personal
nirvana when it comes to Geode. That said, I'd rather manage
it as a single artifact at release time for at least as long as it
takes to graduate from the Incubator. After that, once the community
has demonstrated its skill in managing release all sorts of asynchronous
release options become possible.


Thanks,
Roman.

On Thu, Jul 2, 2015 at 5:56 PM, John Blum  wrote:
> Personally, I would like to see Apache Geode become more modular, even down
> to the key low-level functional components, or features of Geode (such as
> Querying/Indexing, Persistence, Compression, Security,
> Management/Monitoring, Function Execution, even Membership, etc, etc). Of
> course, such fine-grained modularity at this point will be very difficult
> to achieve in the short-term given the unclear delineation of concerns in
> the code, but certainly high-level features such as the Spark Integration,
> along with other good examples, such as the eventual HTTP Session
> Management, Hibernate support, Memcached integration along with the
> eventual rollout of the Redis integration, or even our tooling (jVSD, Gfsh,
> etc, etc) are prime candidates to keep separate, with individual
> deliverables.
>
> These "other modules" should consume Geode artifacts and not be directly
> tied to the Geode "core" (codebase), thus making Geode more modular,
> extensible, configurable with different provider implementations
> (conforming to well-defined "SPIs") etc.
>
> Spring Data GemFire is one such example that "consumes" GemFire/Geode
> artifacts and evolves concurrently, but separately.  More add-ons/plugins
> should evolve the same way, and Geode should be the "core", umbrella
> project for all the satellite efforts, IMO.
>
> -John
>
>
> On Thu, Jul 2, 2015 at 5:39 PM, Anthony Baker  wrote:
>
>> >
>> > We are wondering wether to have this as part of Geode repo or on separate
>> > public GitHub repo?
>> >
>>
>> I think the spark connector belongs in the geode community, which implies
>> the geode ASF repo.  I think we can address the other concerns technically.
>>
>> > General Question:
>> > Can a module under Geode repo be released independent of Geode Release.
>> > E.g.: Can we release the connector without tied to Geode release?
>>
>> This is an interesting question I don’t know the answer to.  However, I
>> think we can handle this by creating a geode release frequently enough to
>> satisfy our community.  For example, if there is a new spark version
>> available we can determine if there is value to the community in creating a
>> release (geode + spark connector) containing that support.  Another option
>> to explore is to create a looser coupling such that the spark connector can
>> work across multiple spark versions (I know this is possible with Hadoop,
>> not sure about Spark).
>>
>> >
>> > Any input/suggestions?
>> >
>> > Thanks,
>> > -Anil.
>>
>> Anthony
>>
>>
>
>
> --
> -John
> 503-504-8657
> john.blum10101 (skype)


Re: Where to place "Spark + GemFire" connector.

2015-07-06 Thread Roman Shaposhnik
On Thu, Jul 2, 2015 at 5:39 PM, Anthony Baker  wrote:
>>
>> We are wondering wether to have this as part of Geode repo or on separate
>> public GitHub repo?
>>
>
> I think the spark connector belongs in the geode community, which implies the 
> geode ASF repo.
> I think we can address the other concerns technically.

I am very much +1 on this. The biggest hurdle Geode community has in front
of it is having its first ASF incubating release. Having one artifact to worry
about would be way simpler.

Later on, you may entertain a question of having an independent subproject, but
that's later.

>> General Question:
>> Can a module under Geode repo be released independent of Geode Release.
>> E.g.: Can we release the connector without tied to Geode release?
>
> This is an interesting question I don’t know the answer to.  However, I think 
> we can
> handle this by creating a geode release frequently enough to satisfy our 
> community.

Huge +1 to the above. Release early, release often and validate
strongly. The later
point is actually very much relevant to this discussion: if you
unbundle your spark
connector from Geode releases how do you know that a given Geode
release actually
works with it? I hope you test. But once you do -- you may as well release it.

Thanks,
Roman.


Re: Where to place "Spark + GemFire" connector.

2015-07-06 Thread Darrel Schneider
The "feature/GEODE-9" branch has just been created. In addition to the core
geode code it also has a "gemfire-spark-connector" sub-directory from the
recent geode code drop.

On Thu, Jul 2, 2015 at 5:56 PM, John Blum  wrote:

> Personally, I would like to see Apache Geode become more modular, even down
> to the key low-level functional components, or features of Geode (such as
> Querying/Indexing, Persistence, Compression, Security,
> Management/Monitoring, Function Execution, even Membership, etc, etc). Of
> course, such fine-grained modularity at this point will be very difficult
> to achieve in the short-term given the unclear delineation of concerns in
> the code, but certainly high-level features such as the Spark Integration,
> along with other good examples, such as the eventual HTTP Session
> Management, Hibernate support, Memcached integration along with the
> eventual rollout of the Redis integration, or even our tooling (jVSD, Gfsh,
> etc, etc) are prime candidates to keep separate, with individual
> deliverables.
>
> These "other modules" should consume Geode artifacts and not be directly
> tied to the Geode "core" (codebase), thus making Geode more modular,
> extensible, configurable with different provider implementations
> (conforming to well-defined "SPIs") etc.
>
> Spring Data GemFire is one such example that "consumes" GemFire/Geode
> artifacts and evolves concurrently, but separately.  More add-ons/plugins
> should evolve the same way, and Geode should be the "core", umbrella
> project for all the satellite efforts, IMO.
>
> -John
>
>
> On Thu, Jul 2, 2015 at 5:39 PM, Anthony Baker  wrote:
>
> > >
> > > We are wondering wether to have this as part of Geode repo or on
> separate
> > > public GitHub repo?
> > >
> >
> > I think the spark connector belongs in the geode community, which implies
> > the geode ASF repo.  I think we can address the other concerns
> technically.
> >
> > > General Question:
> > > Can a module under Geode repo be released independent of Geode Release.
> > > E.g.: Can we release the connector without tied to Geode release?
> >
> > This is an interesting question I don’t know the answer to.  However, I
> > think we can handle this by creating a geode release frequently enough to
> > satisfy our community.  For example, if there is a new spark version
> > available we can determine if there is value to the community in
> creating a
> > release (geode + spark connector) containing that support.  Another
> option
> > to explore is to create a looser coupling such that the spark connector
> can
> > work across multiple spark versions (I know this is possible with Hadoop,
> > not sure about Spark).
> >
> > >
> > > Any input/suggestions?
> > >
> > > Thanks,
> > > -Anil.
> >
> > Anthony
> >
> >
>
>
> --
> -John
> 503-504-8657
> john.blum10101 (skype)
>


Re: Where to place "Spark + GemFire" connector.

2015-07-02 Thread John Blum
Personally, I would like to see Apache Geode become more modular, even down
to the key low-level functional components, or features of Geode (such as
Querying/Indexing, Persistence, Compression, Security,
Management/Monitoring, Function Execution, even Membership, etc, etc). Of
course, such fine-grained modularity at this point will be very difficult
to achieve in the short-term given the unclear delineation of concerns in
the code, but certainly high-level features such as the Spark Integration,
along with other good examples, such as the eventual HTTP Session
Management, Hibernate support, Memcached integration along with the
eventual rollout of the Redis integration, or even our tooling (jVSD, Gfsh,
etc, etc) are prime candidates to keep separate, with individual
deliverables.

These "other modules" should consume Geode artifacts and not be directly
tied to the Geode "core" (codebase), thus making Geode more modular,
extensible, configurable with different provider implementations
(conforming to well-defined "SPIs") etc.

Spring Data GemFire is one such example that "consumes" GemFire/Geode
artifacts and evolves concurrently, but separately.  More add-ons/plugins
should evolve the same way, and Geode should be the "core", umbrella
project for all the satellite efforts, IMO.

-John


On Thu, Jul 2, 2015 at 5:39 PM, Anthony Baker  wrote:

> >
> > We are wondering wether to have this as part of Geode repo or on separate
> > public GitHub repo?
> >
>
> I think the spark connector belongs in the geode community, which implies
> the geode ASF repo.  I think we can address the other concerns technically.
>
> > General Question:
> > Can a module under Geode repo be released independent of Geode Release.
> > E.g.: Can we release the connector without tied to Geode release?
>
> This is an interesting question I don’t know the answer to.  However, I
> think we can handle this by creating a geode release frequently enough to
> satisfy our community.  For example, if there is a new spark version
> available we can determine if there is value to the community in creating a
> release (geode + spark connector) containing that support.  Another option
> to explore is to create a looser coupling such that the spark connector can
> work across multiple spark versions (I know this is possible with Hadoop,
> not sure about Spark).
>
> >
> > Any input/suggestions?
> >
> > Thanks,
> > -Anil.
>
> Anthony
>
>


-- 
-John
503-504-8657
john.blum10101 (skype)


Re: Where to place "Spark + GemFire" connector.

2015-07-02 Thread Anthony Baker
> 
> We are wondering wether to have this as part of Geode repo or on separate
> public GitHub repo?
> 

I think the spark connector belongs in the geode community, which implies the 
geode ASF repo.  I think we can address the other concerns technically.

> General Question:
> Can a module under Geode repo be released independent of Geode Release.
> E.g.: Can we release the connector without tied to Geode release?

This is an interesting question I don’t know the answer to.  However, I think 
we can handle this by creating a geode release frequently enough to satisfy our 
community.  For example, if there is a new spark version available we can 
determine if there is value to the community in creating a release (geode + 
spark connector) containing that support.  Another option to explore is to 
create a looser coupling such that the spark connector can work across multiple 
spark versions (I know this is possible with Hadoop, not sure about Spark).

> 
> Any input/suggestions?
> 
> Thanks,
> -Anil.

Anthony



Where to place "Spark + GemFire" connector.

2015-07-02 Thread Anilkumar Gingade
Hi Team,

We Have build "Spark + Geode" connector, which allows users to write Spark
application to store/retrieve/query RDDs to/from Geode cluster.

We are wondering wether to have this as part of Geode repo or on separate
public GitHub repo?

Why are we thinking about separate GitHub repo:
- The connector is more driven by Spark Applications to store/retrieve data
from Geode, inlines more with Spark than Geode.
- Has simple build structure (using sbt) than the current Geode
- Spark releases are often and the APIs changes from release to release; having
it in separate helps us to do connector releases followed by Spark
release without
Geode release dependency.
- Easier to manage/maintain the "Spark + connector" with specific Spark
version.

General Question:
Can a module under Geode repo be released independent of Geode Release.
E.g.: Can we release the connector without tied to Geode release?

Any input/suggestions?

Thanks,
-Anil.