Hi Reynold,
Is there any way to know when an executor will no longer have any tasks?
It seems to me there is no timeout which is appropriate that is long enough
to ensure that no more tasks will be scheduled on the executor, and short
enough to be appropriate to wait on during an interactive
Spark has hit one of the enternal problems of OSS projects, one hit by: ant,
maven, hadoop, ... anything with a plugin model.
Take in the plugin: you're in control, but also down for maintenance
Leave out the plugin: other people can maintain it, be more agile, etc.
But you've lost control,
I have worked with various ASF projects for 4+ years now. Sure, ASF
projects can delete code as they feel fit. But this is the first time I
have really seen code being "moved out" of a project without discussion. I
am sure you can do this without violating ASF policy, but the explanation
for that
Anyone can pass the spark build with scala-2.10 ?
[info] Compiling 475 Scala sources and 78 Java sources to
/Users/jzhang/github/spark/core/target/scala-2.10/classes...
[error]
/Users/jzhang/github/spark/core/src/main/scala/org/apache/spark/deploy/mesos/MesosExternalShuffleService.scala:30:
There is no way to really know that, because users might run queries at any
given point.
BTW why can't your threads be just daemon threads?
On Wed, Mar 16, 2016 at 3:29 PM, Dan Burkert wrote:
> Hi Reynold,
>
> Is there any way to know when an executor will no longer have
On Thu, Mar 17, 2016 at 2:55 PM, Cody Koeninger wrote:
> Why would a PMC vote be necessary on every code deletion?
>
Certainly PMC votes are not necessary on *every* code deletion. I dont'
think there is a very clear rule on when such discussion is warranted, just
a soft
We made that fork to hide package private classes/members in the generated
Java API doc. Otherwise, the Java API doc is very messy. The patch is to
map all private[*] to the default scope in the generated Java code.
However, this might not be the expected behavior for other packages. So it
didn't
Hi Steve,
I referenced the ShutdownHookManager in my original message, but it appears
to be an internal-only API. Looks like it uses a Hadoop equivalent
internally, though, so I'll look into using that. Good tip about timeouts,
thanks.
- Dan
On Thu, Mar 17, 2016 at 5:02 AM, Steve Loughran
There's a difference between "without discussion" and "without as much
discussion as I would have liked to have a chance to notice it".
There are plenty of PRs that got merged before I noticed them that I
would rather have not gotten merged.
As far as group / artifact name compatibility, at least
On Thu, Mar 17, 2016 at 12:01 PM, Cody Koeninger wrote:
> i. An ASF project can clearly decide that some of its code is no
> longer worth maintaining and delete it. This isn't really any
> different. It's still apache licensed so ultimately whoever wants the
> code can get
On Fri, Mar 18, 2016 at 10:09 AM, Jean-Baptiste Onofré
wrote:
> a project can have multiple repos: it's what we have in ServiceMix, in
> Karaf.
> For the *-extra on github, if the code has been in the ASF, the PMC members
> have to vote to move the code on *-extra.
That's
Maybe just add a watch dog thread and closed the connection upon some
timeout?
On Wednesday, March 16, 2016, Dan Burkert wrote:
> Hi all,
>
> I'm working on the Spark connector for Apache Kudu, and I've run into an
> issue that is a bit beyond my Spark knowledge. The Kudu
Hello all,
Recently a lot of the streaming backends were moved to a separate
project on github and removed from the main Spark repo.
While I think the idea is great, I'm a little worried about the
execution. Some concerns were already raised on the bug mentioned
above, but I'd like to have a
No, I didn't yet - feel free to create a JIRA.
On Thu, 17 Mar 2016 at 22:55 Daniel Siegmann
wrote:
> Hi Nick,
>
> Thanks again for your help with this. Did you create a ticket in JIRA for
> investigating sparse models in LR and / or multivariate summariser? If so,
+1 on Marcelo's comments. It would be nice not to pollute commit messages
with the instructions because some people might forget to remove them.
Nobody has suggested removing the template.
On Tue, Mar 15, 2016 at 3:59 PM, Joseph Bradley
wrote:
> +1 for keeping the
I was not aware of a discussion in Dev list about this - agree with most of
the observations.
In addition, I did not see PMC signoff on moving (sub-)modules out.
Regards
Mridul
On Thursday, March 17, 2016, Marcelo Vanzin wrote:
> Hello all,
>
> Recently a lot of the
hi everyone,
I've recently gotten moving on solving some of the low-level data
interoperability problems between Python's NumPy-focused scientific
computing and data libraries like pandas and the rest of the big data
ecosystem, Spark being a very important part of that.
One of the major efforts
I tried again this morning :
$ wget
https://s3.amazonaws.com/spark-related-packages/spark-1.6.1-bin-hadoop2.6.tgz
--2016-03-18 07:55:30--
https://s3.amazonaws.com/spark-related-packages/spark-1.6.1-bin-hadoop2.6.tgz
Resolving s3.amazonaws.com... 54.231.19.163
...
$ tar zxf
There are several open Jiras to add new Sinks
OpenTSDB https://issues.apache.org/jira/browse/SPARK-12194
StatsD https://issues.apache.org/jira/browse/SPARK-11574
Kafka https://issues.apache.org/jira/browse/SPARK-13392
Some have PRs from 2015 so I'm assuming there is not the desire to
integrate
https://s3.amazonaws.com/spark-related-packages/spark-1.6.1-bin-hadoop2.6.tgz
Does anyone else have trouble unzipping this? How did this happen?
What I get is:
$ gzip -t spark-1.6.1-bin-hadoop2.6.tgz
gzip: spark-1.6.1-bin-hadoop2.6.tgz: unexpected end of file
gzip:
Hi Spark experts,
I am using Spark 1.5.2 on YARN with dynamic allocation enabled. I see in
the driver/application master logs that the app is marked as SUCCEEDED and
then SparkContext stop is called. However, this stop sequence takes > 10
minutes to complete, and YARN resource manager kills the
> On Mar 19, 2016, at 8:32 AM, Steve Loughran wrote:
>
>
>> On 18 Mar 2016, at 17:07, Marcelo Vanzin wrote:
>>
>> Hi Steve, thanks for the write up.
>>
>> On Fri, Mar 18, 2016 at 3:12 AM, Steve Loughran
>> wrote:
>>> If
Marcelo Vanzin wrote earlier:
> Recently a lot of the streaming backends were moved to a separate
> project on github and removed from the main Spark repo.
Question: why was the code removed from the Spark repo? What's the harm
in keeping it available here?
The ASF is perfectly happy if anyone
Hi Marcelo,
a project can have multiple repos: it's what we have in ServiceMix, in
Karaf.
For the *-extra on github, if the code has been in the ASF, the PMC
members have to vote to move the code on *-extra.
Regards
JB
On 03/18/2016 06:07 PM, Marcelo Vanzin wrote:
Hi Steve, thanks for
+1
On Mar 19, 2016 08:33, "Pete Robbins" wrote:
> This seems to me to be unnecessarily restrictive. These are very useful
> extension points for adding 3rd party sources and sinks.
>
> I intend to make an Elasticsearch sink available on spark-packages but
> this will require
> On 18 Mar 2016, at 22:24, Marcelo Vanzin wrote:
>
> On Fri, Mar 18, 2016 at 2:12 PM, chrismattmann wrote:
>> So, my comment here is that any code *cannot* be removed from an Apache
>> project if there is a VETO issued which so far I haven't seen,
> On 18 Mar 2016, at 17:07, Marcelo Vanzin wrote:
>
> Hi Steve, thanks for the write up.
>
> On Fri, Mar 18, 2016 at 3:12 AM, Steve Loughran
> wrote:
>> If you want a separate project, eg. SPARK-EXTRAS, then it *generally* needs
>> to go through
> On 17 Mar 2016, at 21:33, Marcelo Vanzin wrote:
>
> Hi Reynold, thanks for the info.
>
> On Thu, Mar 17, 2016 at 2:18 PM, Reynold Xin wrote:
>> If one really feels strongly that we should go through all the overhead to
>> setup an ASF subproject for
On Linux, I got:
$ tar zxf spark-1.6.1-bin-hadoop2.6.tgz
gzip: stdin: unexpected end of file
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
On Wed, Mar 16, 2016 at 5:15 PM, Nicholas Chammas <
nicholas.cham...@gmail.com> wrote:
>
>
Code can be removed from an ASF project.
That code can live on elsewhere (in accordance with the license)
It can't be presented as part of the official ASF project, like any
other 3rd party project
The package name certainly must change from org.apache.spark
I don't know of a protocol, but
Thanks for the replies, responses inline:
On Wed, Mar 16, 2016 at 3:36 PM, Reynold Xin wrote:
> There is no way to really know that, because users might run queries at
> any given point.
>
> BTW why can't your threads be just daemon threads?
>
The bigger issue is that we
Note the non-kafka bug was filed right before the change was pushed.
So there really wasn't any discussion before the decision was made to
remove that code.
I'm just trying to merge both discussions here in the list where it's
a little bit more dynamic than bug updates that end up getting lost in
Hi,
Scala version:2.11.7(had to upgrade the scala verison to enable case
clasess to accept more than 22 parameters.)
Spark version:1.6.1.
PFB pom.xml
Getting below error when trying to setup spark on intellij IDE,
16/03/16 18:36:44 INFO spark.SparkContext: Running Spark version 1.6.1
See the instructions in the Spark documentation:
https://spark.apache.org/docs/latest/building-spark.html#building-for-scala-211
On Wed, Mar 16, 2016 at 7:05 PM satyajit vegesna
wrote:
>
>
> Hi,
>
> Scala version:2.11.7(had to upgrade the scala verison to enable case
Hi all,
I'm working on the Spark connector for Apache Kudu, and I've run into an
issue that is a bit beyond my Spark knowledge. The Kudu connector
internally holds an open connection to the Kudu cluster
Hello all,
I would like to bring your attention to a small project to integrate
TensorFlow with Apache Spark, called TensorFrames. With this library, you
can map, reduce or aggregate numerical data stored in Spark dataframes
using TensorFlow computation graphs. It is published as a Spark package
Also, just wanted to point out something:
On Thu, Mar 17, 2016 at 2:18 PM, Reynold Xin wrote:
> Thanks for initiating this discussion. I merged the pull request because it
> was unblocking another major piece of work for Spark 2.0: not requiring
> assembly jars
While I do
Err, whoops, looks like this is a user app and not building Spark itself,
so you'll have to change your deps to use the 2.11 versions of Spark.
e.g. spark-streaming_2.10 -> spark-streaming_2.11.
On Wed, Mar 16, 2016 at 7:07 PM Josh Rosen wrote:
> See the instructions
I think it'd make sense to have the merge script automatically remove some
parts of the template, if they were not removed by the contributor. That
seems trivial to do.
On Tue, Mar 15, 2016 at 3:59 PM, Joseph Bradley
wrote:
> +1 for keeping the template
>
> I figure any
OK cool. I'll test the hadoop-2.6 package and check back here if it's still
broken.
Just curious: How did those packages all get corrupted (if we know)? Seems
like a strange thing to happen.
2016년 3월 17일 (목) 오전 11:57, Michael Armbrust 님이 작성:
> Patrick reuploaded the
We use it in executors to get to :
a) spark conf (for getting to hadoop config in map doing custom
writing of side-files)
b) Shuffle manager (to get shuffle reader)
Not sure if there are alternative ways to get to these.
Regards,
Mridul
On Wed, Mar 16, 2016 at 2:52 PM, Reynold Xin
Thanks for initiating this discussion. I merged the pull request because it
was unblocking another major piece of work for Spark 2.0: not requiring
assembly jars, which is arguably a lot more important than sources that are
less frequently used. I take full responsibility for that.
I think it's
This seems to me to be unnecessarily restrictive. These are very useful
extension points for adding 3rd party sources and sinks.
I intend to make an Elasticsearch sink available on spark-packages but this
will require a single class, the sink, to be in the org.apache.spark
package tree. I could
I am not referring to code edits - but to migrating submodules and
code currently in Apache Spark to 'outside' of it.
If I understand correctly, assets from Apache Spark are being moved
out of it into thirdparty external repositories - not owned by Apache.
At a minimum, dev@ discussion (like this
Looks like the other packages may also be corrupt. I’m getting the same
error for the Spark 1.6.1 / Hadoop 2.4 package.
https://s3.amazonaws.com/spark-related-packages/spark-1.6.1-bin-hadoop2.4.tgz
Nick
On Wed, Mar 16, 2016 at 8:28 PM Ted Yu wrote:
> On Linux, I got:
>
Same with hadoop 2.3 tar ball:
$ tar zxf spark-1.6.1-bin-hadoop2.3.tgz
gzip: stdin: unexpected end of file
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
On Wed, Mar 16, 2016 at 5:47 PM, Nicholas Chammas <
nicholas.cham...@gmail.com>
On 17 Mar 2016, at 17:46, Dan Burkert
> wrote:
Looks like it uses a Hadoop equivalent internally, though, so I'll look into
using that. Good tip about timeouts, thanks.
Dont think that's actually tagged as @Public, but it would upset too many
Hi Nick,
Thanks again for your help with this. Did you create a ticket in JIRA for
investigating sparse models in LR and / or multivariate summariser? If so,
can you give me the issue key(s)? If not, would you like me to create these
tickets?
I'm going to look into this some more and see if I
So, my comment here is that any code *cannot* be removed from an Apache
project if there is a VETO issued which so far I haven't seen, though maybe
Marcelo can clarify that.
However if a VETO was issued, then the code cannot be removed and must be
put back. Anyone can fork anything our license
The build was broken as of this morning.
Created PR:
https://github.com/apache/spark/pull/11787
On Wed, Mar 16, 2016 at 11:46 PM, Jeff Zhang wrote:
> Anyone can pass the spark build with scala-2.10 ?
>
>
> [info] Compiling 475 Scala sources and 78 Java sources to
>
If the intention is to actually decouple and give a life of it's own to
these connectors, I would have expected that they would still be hosted as
different git repositories inside Apache even tough users will not really
see much difference as they would still be mirrored in GitHub. This makes
it
51 matches
Mail list logo