Re: Spark 2.0.0-preview artifacts still not available in Maven

2016-06-01 Thread Sean Owen
On Wed, Jun 1, 2016 at 5:58 PM, Reynold Xin  wrote:
> The preview release is available here:
> http://spark.apache.org/downloads.html (there is an entire section dedicated
> to it and also there is a news link to it on the right).

Oops, it is indeed down there at the bottom, before nightlies. I
honestly missed it below the fold. I'd advocate for making it a (non
default?) option in the main downloads dropdown, but this then becomes
a minor issue. The core source/binary artifacts _are_ publicly
available.


> "In addition to the distribution directory, project that use Maven or a
> related build tool sometimes place their releases on repository.apache.org
> beside some convenience binaries. The distribution directory is required,
> while the repository system is an optional convenience."

Agree. The question is what makes this release special? because other
releases have been published to Maven. I think the argument is that
it's a buggy alpha/beta/preview release, but so were 0.x releases.
Reasonable people could make up different policies, so here I'm
appealing to guidance: http://www.apache.org/dev/release.html

"Releases are packages that have been approved for general public
release, with varying degrees of caveat regarding their perceived
quality or potential for change. Releases that are intended for
everyday usage by non-developers are usually referred to as "stable"
or "general availability (GA)" releases. Releases that are believed to
be usable by testers and developers outside the project, but perhaps
not yet stable in terms of features or functionality, are usually
referred to as "beta" or "unstable". Releases that only represent a
project milestone and are intended only for bleeding-edge developers
working outside the project are called "alpha"."

I don't think releases are defined by whether they're stable or buggy,
but by whether they were produced by a sanctioned process that
protects contributors under the ASF umbrella, etc etc. Compare to a
nightly build which we don't want everyone to consume, not so much
because it might be buggier, but because these protections don't
apply.

Certainly, it's vital to communicate how to interpret the stability of
the releases, but -preview releases are still normal releases to the
public.

I don't think bugginess therefore is the question. Any Spark dev knows
that x.y.0 Spark releases have gone out with even Critical and in the
past Blocker issues unresolved, and the world failed to fall apart.
(We're better about this now.) I actually think the -preview release
idea is worth repeating for this reason -- .0-preview is the new .0.
It'd be more accurate IMHO and better for all.


> I think it'd be pretty bad if preview releases in anyway become "default
> version", because they are unstable and contain a lot of blocker bugs.

Why would this happen? releases happen ~3 months and could happen
faster if this is a concern. 2.0.0 final is, I'd wager, coming in <1
month.


> 2. On the download page, have two sections. One listing the normal releases,
> and the other listing preview releases.

+1, that puts it above the fold and easily findable to anyone willing
to consume such a thing.


> 3. Everywhere we mention preview releases, include the proper disclaimer
> e.g. "This preview is not a stable release in terms of either API or
> functionality, but it is meant to give the community early access to try the
> code that will become Spark 2.0."

Can't hurt to overcommunicate this for -preview releases in general.


> 4. Publish normal releases to maven central, and preview releases only to
> the staging maven repo. But of course we should include the temporary maven
> repo for preview releases on the download page.

This is the only thing I disagree with. AFAIK other ASF projects
readily publish alpha and beta releases, under varying naming
conventions (alpha, beta, RC1, etc) It's not something that needs to
be hidden like a nightly.

The audience for Maven artifacts are developers, not admins or users.
Compare the risk of a developer somehow not understanding what they're
getting, to the friction caused by making developers add a repo to get
at it.

I get it, that seems minor. But given the recent concern about making
sure "2.0.0 preview" is available as an ASF release, I'd advise us to
make sure this release is not any harder to get at than others, to
really put that to bed.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



OOM while unrolling in Spark 1.5.2

2016-06-01 Thread Thomas Gerber
Hello,

I came across a weird non-easily replicable OOM in executor during
unrolling. The standalone cluster uses default memory settings on Spark
1.5.2.

What strikes me is that the OOM happens when Spark tries to allocate a
bytebuffer for the FileChannel when dropping blocks from memory and writing
them on disk because of storage level.

To be fair, the cluster is under lots of memory pressure.

I was thinking decreasing the spark.storage.safetyFraction to give more
breathing room to Spark. Is that the right way to think here?

The stacktrace:

java.lang.OutOfMemoryError
at sun.misc.Unsafe.allocateMemory(Native Method)
at java.nio.DirectByteBuffer.(DirectByteBuffer.java:127)
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:174)
at sun.nio.ch.IOUtil.write(IOUtil.java:58)
at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:205)
at
org.apache.spark.storage.DiskStore$$anonfun$putBytes$1.apply$mcV$sp(DiskStore.scala:50)
at
org.apache.spark.storage.DiskStore$$anonfun$putBytes$1.apply(DiskStore.scala:49)
at
org.apache.spark.storage.DiskStore$$anonfun$putBytes$1.apply(DiskStore.scala:49)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1206)
at org.apache.spark.storage.DiskStore.putBytes(DiskStore.scala:52)
at
org.apache.spark.storage.BlockManager.dropFromMemory(BlockManager.scala:1043)
at
org.apache.spark.storage.BlockManager.dropFromMemory(BlockManager.scala:1002)
at
org.apache.spark.storage.MemoryStore$$anonfun$ensureFreeSpace$4.apply(MemoryStore.scala:468)
at
org.apache.spark.storage.MemoryStore$$anonfun$ensureFreeSpace$4.apply(MemoryStore.scala:457)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at
org.apache.spark.storage.MemoryStore.ensureFreeSpace(MemoryStore.scala:457)
at
org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:292)
at
org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:262)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)


Thanks,
*Thomas Gerber*
Director of Data Engineering


java.lang.OutOfMemoryError
at sun.misc.Unsafe.allocateMemory(Native Method)
at java.nio.DirectByteBuffer.(DirectByteBuffer.java:127)
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:174)
at sun.nio.ch.IOUtil.write(IOUtil.java:58)
at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:205)
at 
org.apache.spark.storage.DiskStore$$anonfun$putBytes$1.apply$mcV$sp(DiskStore.scala:50)
at 
org.apache.spark.storage.DiskStore$$anonfun$putBytes$1.apply(DiskStore.scala:49)
at 
org.apache.spark.storage.DiskStore$$anonfun$putBytes$1.apply(DiskStore.scala:49)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1206)
at org.apache.spark.storage.DiskStore.putBytes(DiskStore.scala:52)
at 
org.apache.spark.storage.BlockManager.dropFromMemory(BlockManager.scala:1043)
at 
org.apache.spark.storage.BlockManager.dropFromMemory(BlockManager.scala:1002)
at 
org.apache.spark.storage.MemoryStore$$anonfun$ensureFreeSpace$4.apply(MemoryStore.scala:468)
at 
org.apache.spark.storage.MemoryStore$$anonfun$ensureFreeSpace$4.apply(MemoryStore.scala:457)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at 
org.apache.spark.storage.MemoryStore.ensureFreeSpace(MemoryStore.scala:457)
at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:292)
at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:262)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at 

Re: Spark 2.0.0-preview artifacts still not available in Maven

2016-06-01 Thread Reynold Xin
Hi Sean,

(writing this email with my Apache hat on only and not Databricks hat)

The preview release is available here:
http://spark.apache.org/downloads.html (there is an entire section
dedicated to it and also there is a news link to it on the right).

Again, I think this is a good opportunity to define what a release should
contain. Based on
http://www.apache.org/dev/release.html#where-do-releases-go

"In addition to the distribution directory, project that use Maven or a
related build tool sometimes place their releases on repository.apache.org
beside some convenience binaries. The distribution directory is required,
while the repository system is an optional convenience."

So I'm reading it as that maven publication is not necessary. My
understanding is that the general community (beyond who follows the dev
list) should understand that preview is not a stable release, and we as the
PMC should set expectations accordingly. Developers that can test the
preview releases tend to be more savvy and are comfortable on the bleeding
edge. It is actually fairly easy for them to add a maven repo. Now reading
the page I realized no where on the page did we mention the temporary maven
repo. I will fix that.

I think it'd be pretty bad if preview releases in anyway become "default
version", because they are unstable and contain a lot of blocker bugs.

So my concrete proposal is:

1. Separate (officially voted) releases into normal and preview.

2. On the download page, have two sections. One listing the normal
releases, and the other listing preview releases.

3. Everywhere we mention preview releases, include the proper disclaimer
e.g. "This preview is not a stable release in terms of either API or
functionality, but it is meant to give the community early access to try
the code that will become Spark 2.0."

4. Publish normal releases to maven central, and preview releases only to
the staging maven repo. But of course we should include the temporary maven
repo for preview releases on the download page.






On Wed, Jun 1, 2016 at 3:10 PM, Sean Owen  wrote:

> I'll be more specific about the issue that I think trumps all this,
> which I realize maybe not everyone was aware of.
>
> There was a long and contentious discussion on the PMC about, among
> other things, advertising a "Spark 2.0 preview" from Databricks, such
> as at
> https://databricks.com/blog/2016/05/11/apache-spark-2-0-technical-preview-easier-faster-and-smarter.html
>
> That post has already been updated/fixed from an earlier version, but
> part of the resolution was to make a full "2.0.0 preview" release in
> order to continue to be able to advertise it as such. Without it, I
> believe the PMC's conclusion remains that this blog post / product
> announcement is not allowed by ASF policy. Hence, either the product
> announcements need to be taken down and a bunch of wording changed in
> the Databricks product, or, this needs to be a normal release.
>
> Obviously, it seems far easier to just finish the release per usual. I
> actually didn't realize this had not been offered for download at
> http://spark.apache.org/downloads.html either. It needs to be
> accessible there too.
>
>
> We can get back in the weeds about what a "preview" release means,
> but, normal voted releases can and even should be alpha/beta
> (http://www.apache.org/dev/release.html) The culture is, in theory, to
> release early and often. I don't buy an argument that it's too old, at
> 2 weeks, when the alternative is having nothing at all to test
> against.
>
> On Wed, Jun 1, 2016 at 5:02 PM, Michael Armbrust 
> wrote:
> >> I'd think we want less effort, not more, to let people test it? for
> >> example, right now I can't easily try my product build against
> >> 2.0.0-preview.
> >
> >
> > I don't feel super strongly one way or the other, so if we need to
> publish
> > it permanently we can.
> >
> > However, either way you can still test against this release.  You just
> need
> > to add a resolver as well (which is how I have always tested packages
> > against RCs).  One concern with making it permeant is this preview
> release
> > is already fairly far behind branch-2.0, so many of the issues that
> people
> > might report have already been fixed and that might continue even after
> the
> > release is made.  I'd rather be able to force upgrades eventually when we
> > vote on the final 2.0 release.
> >
>


Re: Spark 2.0.0-preview artifacts still not available in Maven

2016-06-01 Thread Sean Owen
I'll be more specific about the issue that I think trumps all this,
which I realize maybe not everyone was aware of.

There was a long and contentious discussion on the PMC about, among
other things, advertising a "Spark 2.0 preview" from Databricks, such
as at 
https://databricks.com/blog/2016/05/11/apache-spark-2-0-technical-preview-easier-faster-and-smarter.html

That post has already been updated/fixed from an earlier version, but
part of the resolution was to make a full "2.0.0 preview" release in
order to continue to be able to advertise it as such. Without it, I
believe the PMC's conclusion remains that this blog post / product
announcement is not allowed by ASF policy. Hence, either the product
announcements need to be taken down and a bunch of wording changed in
the Databricks product, or, this needs to be a normal release.

Obviously, it seems far easier to just finish the release per usual. I
actually didn't realize this had not been offered for download at
http://spark.apache.org/downloads.html either. It needs to be
accessible there too.


We can get back in the weeds about what a "preview" release means,
but, normal voted releases can and even should be alpha/beta
(http://www.apache.org/dev/release.html) The culture is, in theory, to
release early and often. I don't buy an argument that it's too old, at
2 weeks, when the alternative is having nothing at all to test
against.

On Wed, Jun 1, 2016 at 5:02 PM, Michael Armbrust  wrote:
>> I'd think we want less effort, not more, to let people test it? for
>> example, right now I can't easily try my product build against
>> 2.0.0-preview.
>
>
> I don't feel super strongly one way or the other, so if we need to publish
> it permanently we can.
>
> However, either way you can still test against this release.  You just need
> to add a resolver as well (which is how I have always tested packages
> against RCs).  One concern with making it permeant is this preview release
> is already fairly far behind branch-2.0, so many of the issues that people
> might report have already been fixed and that might continue even after the
> release is made.  I'd rather be able to force upgrades eventually when we
> vote on the final 2.0 release.
>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Spark 2.0.0-preview artifacts still not available in Maven

2016-06-01 Thread Michael Armbrust
>
> I'd think we want less effort, not more, to let people test it? for
> example, right now I can't easily try my product build against
> 2.0.0-preview.


I don't feel super strongly one way or the other, so if we need to publish
it permanently we can.

However, either way you can still test against this release.  You just need
to add a resolver as well (which is how I have always tested packages
against RCs).  One concern with making it permeant is this preview release
is already fairly far behind branch-2.0, so many of the issues that people
might report have already been fixed and that might continue even after the
release is made.  I'd rather be able to force upgrades eventually when we
vote on the final 2.0 release.


Re: Spark 2.0.0-preview artifacts still not available in Maven

2016-06-01 Thread Marcelo Vanzin
On Wed, Jun 1, 2016 at 2:51 PM, Sean Owen  wrote:
> I'd think we want less effort, not more, to let people test it? for
> example, right now I can't easily try my product build against
> 2.0.0-preview.

While I understand your point of view, I like the extra effort to get
to these artifacts because it prevents people from easily building
their applications on top of what is known to be an unstable release
(either API-wise or quality wise).

I see this preview release more like a snapshot release that was voted
on for wide testing, instead of a proper release that we want to
encourage people to build on. And like snapshots, I like that to use
it on your application you have to go out of your way and add a
separate repository instead of just changing a version string or
command line argument.

My 2 bits.

-- 
Marcelo

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Spark 2.0.0-preview artifacts still not available in Maven

2016-06-01 Thread Sean Owen
An RC is something that gets voted on, and the final one is turned
into a blessed release. I agree that RCs don't get published to Maven
Central, but releases do of course.

This was certainly to be an official release, right? A beta or alpha
can still be an official, published release. The proximate motivation
was to solve a problem of advertising "Apache Spark 2.0.0 preview" in
a product, when no such release existed from the ASF. Hence the point
was to produce a full regular release, and I think that needs to
include the usual Maven artifacts.

I'd think we want less effort, not more, to let people test it? for
example, right now I can't easily try my product build against
2.0.0-preview.

On Wed, Jun 1, 2016 at 3:53 PM, Marcelo Vanzin  wrote:
> So are RCs, aren't they?
>
> Personally I'm fine with not releasing to maven central. Any extra
> effort needed by regular users to use a preview / RC is good with me.
>
> On Wed, Jun 1, 2016 at 1:50 PM, Reynold Xin  wrote:
>> To play devil's advocate, previews are technically not RCs. They are
>> actually voted releases.
>>
>> On Wed, Jun 1, 2016 at 1:46 PM, Michael Armbrust 
>> wrote:
>>>
>>> Yeah, we don't usually publish RCs to central, right?
>>>
>>> On Wed, Jun 1, 2016 at 1:06 PM, Reynold Xin  wrote:

 They are here ain't they?

 https://repository.apache.org/content/repositories/orgapachespark-1182/

 Did you mean publishing them to maven central? My understanding is that
 publishing to maven central isn't a required step of doing theses. This
 might be a good opportunity to discuss that. My thought is that it is since
 Maven central is immutable, and the purposes of the preview releases are to
 get people to test it early on in preparation for the actual release, it
 might be better to not publish preview releases to maven central. Users
 testing with preview releases can just use the temporary repository above.




 On Wed, Jun 1, 2016 at 11:36 AM, Sean Owen  wrote:
>
> Just checked and they are still not published this week. Can these be
> published ASAP to complete the 2.0.0-preview release?


>>>
>>
>
>
>
> --
> Marcelo

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Spark 2.0.0-preview artifacts still not available in Maven

2016-06-01 Thread Jonathan Kelly
I think what Reynold probably means is that previews are releases for which
a vote *passed*.

~ Jonathan

On Wed, Jun 1, 2016 at 1:53 PM Marcelo Vanzin  wrote:

> So are RCs, aren't they?
>
> Personally I'm fine with not releasing to maven central. Any extra
> effort needed by regular users to use a preview / RC is good with me.
>
> On Wed, Jun 1, 2016 at 1:50 PM, Reynold Xin  wrote:
> > To play devil's advocate, previews are technically not RCs. They are
> > actually voted releases.
> >
> > On Wed, Jun 1, 2016 at 1:46 PM, Michael Armbrust  >
> > wrote:
> >>
> >> Yeah, we don't usually publish RCs to central, right?
> >>
> >> On Wed, Jun 1, 2016 at 1:06 PM, Reynold Xin 
> wrote:
> >>>
> >>> They are here ain't they?
> >>>
> >>>
> https://repository.apache.org/content/repositories/orgapachespark-1182/
> >>>
> >>> Did you mean publishing them to maven central? My understanding is that
> >>> publishing to maven central isn't a required step of doing theses. This
> >>> might be a good opportunity to discuss that. My thought is that it is
> since
> >>> Maven central is immutable, and the purposes of the preview releases
> are to
> >>> get people to test it early on in preparation for the actual release,
> it
> >>> might be better to not publish preview releases to maven central. Users
> >>> testing with preview releases can just use the temporary repository
> above.
> >>>
> >>>
> >>>
> >>>
> >>> On Wed, Jun 1, 2016 at 11:36 AM, Sean Owen  wrote:
> 
>  Just checked and they are still not published this week. Can these be
>  published ASAP to complete the 2.0.0-preview release?
> >>>
> >>>
> >>
> >
>
>
>
> --
> Marcelo
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


Re: Spark 2.0.0-preview artifacts still not available in Maven

2016-06-01 Thread Marcelo Vanzin
So are RCs, aren't they?

Personally I'm fine with not releasing to maven central. Any extra
effort needed by regular users to use a preview / RC is good with me.

On Wed, Jun 1, 2016 at 1:50 PM, Reynold Xin  wrote:
> To play devil's advocate, previews are technically not RCs. They are
> actually voted releases.
>
> On Wed, Jun 1, 2016 at 1:46 PM, Michael Armbrust 
> wrote:
>>
>> Yeah, we don't usually publish RCs to central, right?
>>
>> On Wed, Jun 1, 2016 at 1:06 PM, Reynold Xin  wrote:
>>>
>>> They are here ain't they?
>>>
>>> https://repository.apache.org/content/repositories/orgapachespark-1182/
>>>
>>> Did you mean publishing them to maven central? My understanding is that
>>> publishing to maven central isn't a required step of doing theses. This
>>> might be a good opportunity to discuss that. My thought is that it is since
>>> Maven central is immutable, and the purposes of the preview releases are to
>>> get people to test it early on in preparation for the actual release, it
>>> might be better to not publish preview releases to maven central. Users
>>> testing with preview releases can just use the temporary repository above.
>>>
>>>
>>>
>>>
>>> On Wed, Jun 1, 2016 at 11:36 AM, Sean Owen  wrote:

 Just checked and they are still not published this week. Can these be
 published ASAP to complete the 2.0.0-preview release?
>>>
>>>
>>
>



-- 
Marcelo

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Spark 2.0.0-preview artifacts still not available in Maven

2016-06-01 Thread Reynold Xin
To play devil's advocate, previews are technically not RCs. They are
actually voted releases.

On Wed, Jun 1, 2016 at 1:46 PM, Michael Armbrust 
wrote:

> Yeah, we don't usually publish RCs to central, right?
>
> On Wed, Jun 1, 2016 at 1:06 PM, Reynold Xin  wrote:
>
>> They are here ain't they?
>>
>> https://repository.apache.org/content/repositories/orgapachespark-1182/
>>
>> Did you mean publishing them to maven central? My understanding is that
>> publishing to maven central isn't a required step of doing theses. This
>> might be a good opportunity to discuss that. My thought is that it is since
>> Maven central is immutable, and the purposes of the preview releases are to
>> get people to test it early on in preparation for the actual release, it
>> might be better to not publish preview releases to maven central. Users
>> testing with preview releases can just use the temporary repository above.
>>
>>
>>
>>
>> On Wed, Jun 1, 2016 at 11:36 AM, Sean Owen  wrote:
>>
>>> Just checked and they are still not published this week. Can these be
>>> published ASAP to complete the 2.0.0-preview release?
>>>
>>
>>
>


Re: Spark 2.0.0-preview artifacts still not available in Maven

2016-06-01 Thread Michael Armbrust
Yeah, we don't usually publish RCs to central, right?

On Wed, Jun 1, 2016 at 1:06 PM, Reynold Xin  wrote:

> They are here ain't they?
>
> https://repository.apache.org/content/repositories/orgapachespark-1182/
>
> Did you mean publishing them to maven central? My understanding is that
> publishing to maven central isn't a required step of doing theses. This
> might be a good opportunity to discuss that. My thought is that it is since
> Maven central is immutable, and the purposes of the preview releases are to
> get people to test it early on in preparation for the actual release, it
> might be better to not publish preview releases to maven central. Users
> testing with preview releases can just use the temporary repository above.
>
>
>
>
> On Wed, Jun 1, 2016 at 11:36 AM, Sean Owen  wrote:
>
>> Just checked and they are still not published this week. Can these be
>> published ASAP to complete the 2.0.0-preview release?
>>
>
>


Re: Spark 2.0.0-preview artifacts still not available in Maven

2016-06-01 Thread Reynold Xin
They are here ain't they?

https://repository.apache.org/content/repositories/orgapachespark-1182/

Did you mean publishing them to maven central? My understanding is that
publishing to maven central isn't a required step of doing theses. This
might be a good opportunity to discuss that. My thought is that it is since
Maven central is immutable, and the purposes of the preview releases are to
get people to test it early on in preparation for the actual release, it
might be better to not publish preview releases to maven central. Users
testing with preview releases can just use the temporary repository above.




On Wed, Jun 1, 2016 at 11:36 AM, Sean Owen  wrote:

> Just checked and they are still not published this week. Can these be
> published ASAP to complete the 2.0.0-preview release?
>


Spark 2.0.0-preview artifacts still not available in Maven

2016-06-01 Thread Sean Owen
Just checked and they are still not published this week. Can these be
published ASAP to complete the 2.0.0-preview release?

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: ImportError: No module named numpy

2016-06-01 Thread Julio Antonio Soto de Vicente
Try adding to spark-env.sh (renaming if you still have it with .template at the 
end):

PYSPARK_PYTHON=/path/to/your/bin/python

Where your bin/python is your actual Python environment with Numpy installed.


> El 1 jun 2016, a las 20:16, Bhupendra Mishra  
> escribió:
> 
> I have numpy installed but where I should setup PYTHONPATH?
> 
> 
>> On Wed, Jun 1, 2016 at 11:39 PM, Sergio Fernández  wrote:
>> sudo pip install numpy
>> 
>>> On Wed, Jun 1, 2016 at 5:56 PM, Bhupendra Mishra 
>>>  wrote:
>>> Thanks .
>>> How can this be resolved?
>>> 
 On Wed, Jun 1, 2016 at 9:02 PM, Holden Karau  wrote:
 Generally this means numpy isn't installed on the system or your 
 PYTHONPATH has somehow gotten pointed somewhere odd,
 
> On Wed, Jun 1, 2016 at 8:31 AM, Bhupendra Mishra 
>  wrote:
> If any one please can help me with following error.
> 
>  File 
> "/opt/mapr/spark/spark-1.6.1/python/lib/pyspark.zip/pyspark/mllib/__init__.py",
>  line 25, in 
> 
> ImportError: No module named numpy
> 
> 
> Thanks in advance!
 
 
 
 -- 
 Cell : 425-233-8271
 Twitter: https://twitter.com/holdenkarau
>> 
>> 
>> 
>> -- 
>> Sergio Fernández
>> Partner Technology Manager
>> Redlink GmbH
>> m: +43 6602747925
>> e: sergio.fernan...@redlink.co
>> w: http://redlink.co
> 


Re: ImportError: No module named numpy

2016-06-01 Thread Bhupendra Mishra
I have numpy installed but where I should setup PYTHONPATH?


On Wed, Jun 1, 2016 at 11:39 PM, Sergio Fernández  wrote:

> sudo pip install numpy
>
> On Wed, Jun 1, 2016 at 5:56 PM, Bhupendra Mishra <
> bhupendra.mis...@gmail.com> wrote:
>
>> Thanks .
>> How can this be resolved?
>>
>> On Wed, Jun 1, 2016 at 9:02 PM, Holden Karau 
>> wrote:
>>
>>> Generally this means numpy isn't installed on the system or your
>>> PYTHONPATH has somehow gotten pointed somewhere odd,
>>>
>>> On Wed, Jun 1, 2016 at 8:31 AM, Bhupendra Mishra <
>>> bhupendra.mis...@gmail.com> wrote:
>>>
 If any one please can help me with following error.

  File
 "/opt/mapr/spark/spark-1.6.1/python/lib/pyspark.zip/pyspark/mllib/__init__.py",
 line 25, in 

 ImportError: No module named numpy


 Thanks in advance!


>>>
>>>
>>> --
>>> Cell : 425-233-8271
>>> Twitter: https://twitter.com/holdenkarau
>>>
>>
>>
>
>
> --
> Sergio Fernández
> Partner Technology Manager
> Redlink GmbH
> m: +43 6602747925
> e: sergio.fernan...@redlink.co
> w: http://redlink.co
>


Re: ImportError: No module named numpy

2016-06-01 Thread Sergio Fernández
sudo pip install numpy

On Wed, Jun 1, 2016 at 5:56 PM, Bhupendra Mishra  wrote:

> Thanks .
> How can this be resolved?
>
> On Wed, Jun 1, 2016 at 9:02 PM, Holden Karau  wrote:
>
>> Generally this means numpy isn't installed on the system or your
>> PYTHONPATH has somehow gotten pointed somewhere odd,
>>
>> On Wed, Jun 1, 2016 at 8:31 AM, Bhupendra Mishra <
>> bhupendra.mis...@gmail.com> wrote:
>>
>>> If any one please can help me with following error.
>>>
>>>  File
>>> "/opt/mapr/spark/spark-1.6.1/python/lib/pyspark.zip/pyspark/mllib/__init__.py",
>>> line 25, in 
>>>
>>> ImportError: No module named numpy
>>>
>>>
>>> Thanks in advance!
>>>
>>>
>>
>>
>> --
>> Cell : 425-233-8271
>> Twitter: https://twitter.com/holdenkarau
>>
>
>


-- 
Sergio Fernández
Partner Technology Manager
Redlink GmbH
m: +43 6602747925
e: sergio.fernan...@redlink.co
w: http://redlink.co


Re: [DISCUSS] Removing or changing maintainer process

2016-06-01 Thread Nicholas Chammas
I just heard about mention-bot at PyCon 2016
:

https://github.com/facebook/mention-bot

Do you have a GitHub project that is too big for people to subscribe to all
the notifications? The mention bot will automatically mention potential
reviewers on pull requests. It helps getting faster turnaround on pull
requests by involving the right people early on.

mention-bot checks the blame history for the files modified by a PR and
automatically pings the most likely candidates for review. I wonder if it
would work well for us.

Nick

On Thu, May 19, 2016 at 11:47 AM Nicholas Chammas nicholas.cham...@gmail.com
 wrote:

I’ve also heard that we should try to keep some other instructions for
> contributors to find the “right” reviewers, so it would be great to see
> suggestions on that. For my part, I’d personally prefer something
> “automatic”, such as easily tracking who reviewed each patch and having
> people look at the commit history of the module they want to work on,
> instead of a list that needs to be maintained separately.
>
> Some code review and management tools like Phabricator have a system for
> this , where you can configure
> alerts to automatically ping certain people if a file matching some rule
> (e.g. has this extension, is in this folder, etc.) is modified by a PR.
>
> I think short of deploying Phabricator somehow, probably the most
> realistic option for us to get automatic alerts like this is to have
> someone add that as a feature to the Spark PR Dashboard
> .
>
> I created an issue for this some time ago if anyone wants to take a crack
> at it: https://github.com/databricks/spark-pr-dashboard/issues/47
>
> Nick
> ​
>
> On Thu, May 19, 2016 at 11:42 AM Tom Graves 
> wrote:
>
>> +1 (binding)
>>
>> Tom
>>
>>
>> On Thursday, May 19, 2016 10:35 AM, Matei Zaharia <
>> matei.zaha...@gmail.com> wrote:
>>
>>
>> Hi folks,
>>
>> Around 1.5 years ago, Spark added a maintainer process for reviewing API
>> and architectural changes (
>> https://cwiki.apache.org/confluence/display/SPARK/Committers#Committers-ReviewProcessandMaintainers)
>> to make sure these are seen by people who spent a lot of time on that
>> component. At the time, the worry was that changes might go unnoticed as
>> the project grows, but there were also concerns that this approach makes
>> the project harder to contribute to and less welcoming. Since implementing
>> the model, I think that a good number of developers concluded it doesn't
>> make a huge difference, so because of these concerns, it may be useful to
>> remove it. I've also heard that we should try to keep some other
>> instructions for contributors to find the "right" reviewers, so it would be
>> great to see suggestions on that. For my part, I'd personally prefer
>> something "automatic", such as easily tracking who reviewed each patch and
>> having people look at the commit history of the module they want to work
>> on, instead of a list that needs to be maintained separately.
>>
>> Matei
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
>>
>> ​


Re: ImportError: No module named numpy

2016-06-01 Thread Bhupendra Mishra
Thanks .
How can this be resolved?

On Wed, Jun 1, 2016 at 9:02 PM, Holden Karau  wrote:

> Generally this means numpy isn't installed on the system or your
> PYTHONPATH has somehow gotten pointed somewhere odd,
>
> On Wed, Jun 1, 2016 at 8:31 AM, Bhupendra Mishra <
> bhupendra.mis...@gmail.com> wrote:
>
>> If any one please can help me with following error.
>>
>>  File
>> "/opt/mapr/spark/spark-1.6.1/python/lib/pyspark.zip/pyspark/mllib/__init__.py",
>> line 25, in 
>>
>> ImportError: No module named numpy
>>
>>
>> Thanks in advance!
>>
>>
>
>
> --
> Cell : 425-233-8271
> Twitter: https://twitter.com/holdenkarau
>


Re: ImportError: No module named numpy

2016-06-01 Thread Holden Karau
Generally this means numpy isn't installed on the system or your PYTHONPATH
has somehow gotten pointed somewhere odd,

On Wed, Jun 1, 2016 at 8:31 AM, Bhupendra Mishra  wrote:

> If any one please can help me with following error.
>
>  File
> "/opt/mapr/spark/spark-1.6.1/python/lib/pyspark.zip/pyspark/mllib/__init__.py",
> line 25, in 
>
> ImportError: No module named numpy
>
>
> Thanks in advance!
>
>


-- 
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau


ImportError: No module named numpy

2016-06-01 Thread Bhupendra Mishra
If any one please can help me with following error.

 File
"/opt/mapr/spark/spark-1.6.1/python/lib/pyspark.zip/pyspark/mllib/__init__.py",
line 25, in 

ImportError: No module named numpy


Thanks in advance!


Re: Windows Rstudio to Linux spakR

2016-06-01 Thread Sun Rui
Selvam,

First, deploy the Spark distribution on your Windows machine, which is of the 
same version of Spark in your Linux cluster

Second, follow the instructions at 
https://github.com/apache/spark/tree/master/R#using-sparkr-from-rstudio. 
Specify the Spark master URL for your Linux Spark cluster when calling 
sparkR.init(). Don’t know your Spark cluster deployment mode. If it is YARN, 
you may have to copy YARN conf files from your cluster and set YARN_CONF_DIR 
environment variable to point to it.

These steps are my personal understanding, I have not tested in this scenario. 
Please report if you have any problem.

> On Jun 1, 2016, at 16:55, Selvam Raman  wrote:
> 
> Hi ,
> 
> How to connect to sparkR (which is available in Linux env) using 
> Rstudio(Windows env).
> 
> Please help me.
> 
> -- 
> Selvam Raman
> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"