+1
Thanks
Shivaram
On Thu, Jul 30, 2020 at 11:56 PM Wenchen Fan wrote:
>
> +1, thanks for driving it, Holden!
>
> On Fri, Jul 31, 2020 at 10:24 AM Holden Karau wrote:
>>
>> +1 from myself :)
>>
>> On Thu, Jul 30, 2020 at 2:53 PM Jungtaek Lim
>> wrote:
>>>
>>> +1 (non-binding, I guess)
>>>
Hi all
Just wanted to check if there are any blockers that we are still waiting
for to start the new release process.
Thanks
Shivaram
On Sun, Jul 5, 2020, 06:51 wuyi wrote:
> Ok, after having another look, I think it only affects local cluster deploy
> mode, which is for testing only.
>
>
>
;> https://issues.apache.org/jira/browse/SPARK-32136
>>
>>
>>
>> Thanks,
>>
>> Jason.
>>
>>
>>
>> From: Jungtaek Lim
>> Date: Wednesday, 1 July 2020 at 10:20 am
>> To: Shivaram Venkataraman
>> Cc: Prashant Shar
quot;;"Jungtaek Lim"<
>> kabhwan.opensou...@gmail.com>;"Jules Damji";"Holden
>> Karau";"Reynold Xin";"Shivaram
>> Venkataraman";"Yuanjian Li"<
>> xyliyuanj...@gmail.com>;"Spark dev list";&
+1 Thanks Yuanjian -- I think it'll be great to have a 3.0.1 release soon.
Shivaram
On Tue, Jun 23, 2020 at 3:43 AM Takeshi Yamamuro wrote:
>
> Thanks for the heads-up, Yuanjian!
>
> > I also noticed branch-3.0 already has 39 commits after Spark 3.0.0.
> wow, the updates are so quick. Anyway,
, 2019 at 11:27 AM Shivaram Venkataraman
wrote:
>
> Actually I found this while I was uploading the latest release to CRAN
> -- these docs should be generated as a part of the release process
> though and shouldn't be related to CRAN.
>
> On Wed, May 8, 2019 at 11:24 AM Sean Owen
it due to the
> additional CRAN processes.
>
> On Wed, May 8, 2019 at 11:23 AM Shivaram Venkataraman
> wrote:
> >
> > I just noticed that the SparkR API docs are missing at
> > https://spark.apache.org/docs/latest/api/R/index.html --- It looks
> > like they were miss
I just noticed that the SparkR API docs are missing at
https://spark.apache.org/docs/latest/api/R/index.html --- It looks
like they were missing from the 2.4.3 release?
Thanks
Shivaram
-
To unsubscribe e-mail:
is that if there have not been too many
changes since 2.3.3, how much effort would it be to cut a 2.3.4 with
just this change.
Thanks
Shivaram
-- Forwarded message -
From: Uwe Ligges
Date: Sun, Feb 17, 2019 at 12:28 PM
Subject: Re: CRAN submission SparkR 2.3.3
To: Shivaram Venkataraman , CRAN
Those speedups look awesome! Great work Hyukjin!
Thanks
Shivaram
On Sat, Feb 9, 2019 at 7:41 AM Hyukjin Kwon wrote:
>
> Guys, as continuation of Arrow optimization for R DataFrame to Spark
> DataFrame,
>
> I am trying to make a vectorized gapply[Collect] implementation as an
> experiment like
Thanks Hyukjin! Very cool results
Shivaram
On Fri, Nov 9, 2018 at 10:58 AM Felix Cheung wrote:
>
> Very cool!
>
>
>
> From: Hyukjin Kwon
> Sent: Thursday, November 8, 2018 10:29 AM
> To: dev
> Subject: Arrow optimization in conversion from R DataFrame to Spark
> From: Sean Owen
> Sent: Tuesday, November 6, 2018 10:51 AM
> To: Shivaram Venkataraman
> Cc: Felix Cheung; Wenchen Fan; Matei Zaharia; dev
> Subject: Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0
>
> I think the second option, to skip the tests, is best right now, if
elease of 2.4.0
>
>
>
> From: Wenchen Fan
> Sent: Tuesday, November 6, 2018 8:51 AM
> To: Felix Cheung
> Cc: Matei Zaharia; Sean Owen; Spark dev list; Shivaram Venkataraman
> Subject: Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0
>
> Do you mea
Yep. That sounds good to me.
On Tue, Nov 6, 2018 at 11:06 AM Sean Owen wrote:
>
> Sounds good, remove in 3.1? I can update accordingly.
>
> On Tue, Nov 6, 2018, 10:46 AM Reynold Xin >
>> Maybe deprecate and remove in next version? It is bad to just remove a
>> method without deprecation notice.
Sounds good to me as well. Thanks Shane.
Shivaram
On Fri, Aug 10, 2018 at 1:40 PM Reynold Xin wrote:
>
> SGTM
>
> On Fri, Aug 10, 2018 at 1:39 PM shane knapp wrote:
>>
>> https://issues.apache.org/jira/browse/SPARK-25089
>>
>> basically since these branches are old, and there will be a greater
m
>
> On Monday, July 9, 2018, 4:50:18 PM CDT, Shivaram Venkataraman
> wrote:
>
>
> Yes. I think Felix checked in a fix to ignore tests run on java
> versions that are not Java 8 (I think the fix was in
> https://github.com/apache/spark/pull/21666 which is in 2.3.2)
>
&
Java 9. Spark doesn't
> support that. Is there any way to tell CRAN this should not be tested?
>
> On Mon, Jul 9, 2018, 4:17 PM Shivaram Venkataraman
> wrote:
>>
>> The upcoming 2.2.2 release was submitted to CRAN. I think there are
>> some knows issues on
rvice
Flavor: r-devel-linux-x86_64-debian-gcc, r-devel-windows-ix86+x86_64
Check: CRAN incoming feasibility, Result: WARNING
Maintainer: 'Shivaram Venkataraman '
New submission
Package was archived on CRAN
Insufficient package version (submitted: 2.2.2, existing: 2.3.0)
Possibly mis-spe
e Oracle JDK?
>
> ____
> From: Shivaram Venkataraman
> Sent: Tuesday, June 12, 2018 3:17:52 PM
> To: dev
> Cc: Felix Cheung
> Subject: Fwd: [CRAN-pretest-archived] CRAN submission SparkR 2.3.1
>
> Corresponding to the Spark 2.3.1 release, I submitted the SparkR build
> to C
evel-windows-ix86+x86_64
Check: CRAN incoming feasibility, Result: NOTE
Maintainer: 'Shivaram Venkataraman '
New submission
Package was archived on CRAN
Possibly mis-spelled words in DESCRIPTION:
Frontend (4:10, 5:28)
CRAN repository db overrides:
X-CRAN-Comment: Archived
Hossein -- Can you clarify what the resolution on the repository /
release issue discussed on SPIP ?
Shivaram
On Thu, May 31, 2018 at 9:06 AM, Felix Cheung wrote:
> +1
> With my concerns in the SPIP discussion.
>
>
> From: Hossein
> Sent: Wednesday, May 30,
51
>
> On Tue, May 29, 2018 at 1:52 PM, Shivaram Venkataraman
> wrote:
>>
>> Yes. That is correct
>>
>> Shivaram
>>
>> On Tue, May 29, 2018 at 11:48 AM, Hossein wrote:
>> > I guess this relates to our conversation on the SPIP. When this happe
Yes. That is correct
Shivaram
On Tue, May 29, 2018 at 11:48 AM, Hossein wrote:
> I guess this relates to our conversation on the SPIP. When this happens, do
> we wait for a new minor release to submit it to CRAN again?
>
> --Hossein
>
> On Fri, May 25, 2018 at 5:11 PM, Felix Cheung
> wrote:
+1 We had a SparkR fix for CRAN SystemRequirements that will also be good
to get out.
Shivaram
On Fri, May 11, 2018 at 12:34 PM, Henry Robinson wrote:
> https://github.com/apache/spark/pull/21302
>
> On 11 May 2018 at 11:47, Henry Robinson wrote:
>
>> I was
>
>
>
>>- Fault tolerance and execution model: Spark assumes fine-grained
>>task recovery, i.e. if something fails, only that task is rerun. This
>>doesn’t match the execution model of distributed ML/DL frameworks that are
>>typically MPI-based, and rerunning a single task would
The problem with doing work in the callsite thread is that there are a
number of data structures that are updated during job submission and
these data structures are guarded by the event loop ensuring only one
thread accesses them. I dont think there is a very easy fix for this
given the
t should not be in the release)
>
> Thanks!
>
> _
> From: Shivaram Venkataraman <shiva...@eecs.berkeley.edu>
> Sent: Tuesday, February 20, 2018 2:24 AM
> Subject: Re: [VOTE] Spark 2.3.0 (RC4)
> To: Felix Cheung <felixcheun...@hotmail.com>
> Cc: Sean Owen &
FWIW The search result link works for me
Shivaram
On Mon, Feb 19, 2018 at 6:21 PM, Felix Cheung
wrote:
> These are two separate things:
>
> Does the search result links work for you?
>
> The second is the dist location we are voting on has a .iml file.
>
>
Cheung <felixche...@apache.org>
>>> wrote:
>>>
>>>> This vote passes. Thanks everyone for testing this release.
>>>>
>>>>
>>>> +1:
>>>>
>>>> Sean Owen (binding)
>>>>
>>>> Herman van Hö
+1
SHA, MD5 and signatures look fine. Built and ran Maven tests on my Macbook.
Thanks
Shivaram
On Wed, Nov 29, 2017 at 10:43 AM, Holden Karau wrote:
> +1 (non-binding)
>
> PySpark install into a virtualenv works, PKG-INFO looks correctly
> populated (mostly checking for
repositories. Incorporating something that is not completely
> trusted or approved into the process of building something that we are then
> going to approve as trusted is different from the prior use of cloudfront.
>
> On Wed, Sep 13, 2017 at 10:26 AM, Shivaram Venkataram
The bucket comes from Cloudfront, a CDN thats part of AWS. There was a
bunch of discussion about this back in 2013
https://lists.apache.org/thread.html/9a72ff7ce913dd85a6b112b1b2de536dcda74b28b050f70646aba0ac@1380147885@%3Cdev.spark.apache.org%3E
Shivaram
On Wed, Sep 13, 2017 at 9:30 AM, Sean
Closely related to the PyPi upload thread (https://s.apache.org/WLtM), I
just wanted to give a heads up that we are working on submitting SparkR
from Spark 2.1.1 as a package to CRAN. The package submission is under
review with CRAN right now and I will post updates to this thread.
The main
ttps://www.appveyor.com/
>> docs/notifications/#global-email-notifications).
>>
>> > Warning: Notifications defined on project settings UI are merged with
>> notifications defined in appveyor.yml.
>>
>> Should we maybe an INFRA JIRA to check and ask this?
>>
>>
>>
I'm not sure why the AppVeyor updates are coming to the dev list. Hyukjin
-- Do you know if we made any recent changes that might have caused this ?
Thanks
Shivaram
-- Forwarded message --
From: AppVeyor
Date: Sat, Mar 4, 2017 at 2:46 PM
Subject: Build
FWIW there is an option to Delete the issue (in More -> Delete).
Shivaram
On Fri, Jan 13, 2017 at 8:11 AM, Shivaram Venkataraman
<shiva...@eecs.berkeley.edu> wrote:
> I can't see the resolve button either - Maybe we can forward this to
> Apache Infra and see if they can clo
I can't see the resolve button either - Maybe we can forward this to
Apache Infra and see if they can close these issues ?
Shivaram
On Fri, Jan 13, 2017 at 6:35 AM, Sean Owen wrote:
> Yes, I'm asking about a specific range: 19191 - 19202. These seem to be the
> ones created
In addition to usual binary artifacts, this is the first release where
we have installable packages for Python [1] and R [2] that are part of
the release. I'm including instructions to test the R package below.
Holden / other Python developers can chime in if there are special
instructions to
+0
I am not sure how much of a problem this is but the pip packaging
seems to have changed the size of the hadoop-2.7 artifact. As you can
see in http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc2-bin/,
the Hadoop 2.7 build is 359M almost double the size of the other
Hadoop
FWIW 2.0.1 is also used in the 'Link With Spark' and 'Spark Source
Code Management' sections in that page.
Shivaram
On Mon, Nov 14, 2016 at 11:11 PM, Reynold Xin wrote:
> It's on there on the page (both the release notes and the download version
> dropdown).
>
> The one
The release is available on http://www.apache.org/dist/spark/ and its
on Maven central
http://repo1.maven.org/maven2/org/apache/spark/spark-core_2.11/2.0.2/
I guess Reynold hasn't yet put together the release notes / updates to
the website.
Thanks
Shivaram
On Mon, Nov 14, 2016 at 12:49 PM,
Do we have any query workloads for which we can benchmark these
proposals in terms of performance ?
Thanks
Shivaram
On Sun, Nov 13, 2016 at 5:53 PM, Reynold Xin wrote:
> One additional note: in terms of size, the size of a count-min sketch with
> eps = 0.1% and confidence
At the AMPLab we've been working on a research project that looks at
just the scheduling latencies and on techniques to get lower
scheduling latency. It moves away from the micro-batch model, but
reuses the fault tolerance etc. in Spark. However we haven't yet
figure out all the parts in
+1 - Given that our website is now on github
(https://github.com/apache/spark-website), I think we can move most of
our wiki into the main website. That way we'll only have two sources
of documentation to maintain: A release specific one in the main repo
and the website which is more long lived.
on isolating specific changes that are required etc.
It'd also be great to hear other approaches / next steps to concretize
some of these goals.
Thanks
Shivaram
On Thu, Oct 13, 2016 at 8:39 AM, Fred Reiss <freiss@gmail.com> wrote:
> On Tue, Oct 11, 2016 at 11:02 AM, Shivaram Venkataraman
Thanks Fred - that is very helpful.
> Delivering low latency, high throughput, and stability simultaneously: Right
> now, our own tests indicate you can get at most two of these characteristics
> out of Spark Streaming at the same time. I know of two parties that have
> abandoned Spark Streaming
Yeah I see the apache maven repos have the 2.0.1 artifacts at
https://repository.apache.org/content/repositories/releases/org/apache/spark/spark-core_2.11/
-- Not sure why they haven't synced to maven central yet
Shivaram
On Wed, Oct 5, 2016 at 8:37 PM, Luciano Resende
+1 I think having a 4 month window instead of a 3 month window sounds good.
However I think figuring out a timeline for maintenance releases would
also be good. This is a common concern that comes up in many user
threads and it'll be better to have some structure around this. It
doesn't need to
Disclaimer - I am not very closely involved with Structured Streaming
design / development, so this is just my two cents from looking at the
discussion in the linked JIRAs and PRs.
It seems to me there are a couple of issues being conflated here: (a)
is the question of how to specify or add more
I looked into this and found the problem. Will send a PR now to fix this.
If you are curious about what is happening here: When we build the
docs separately we don't have the JAR files from the Spark build in
the same tree. We added a new set of docs recently in SparkR called an
R vignette that
s to other branches on branch-1.5 and lower
>> versions, I think it'd be fine.
>>
>> One concern is, I am not sure if SparkR tests can pass on branch-1.6 (I
>> checked it passes on branch-2.0 before).
>>
>> I can try to check if it passes and identify the related caus
out to who's in charge of the account :).
>
>
> On 10 Sep 2016 12:41 a.m., "Shivaram Venkataraman"
> <shiva...@eecs.berkeley.edu> wrote:
>>
>> Thanks for debugging - I'll reply on
>> https://issues.apache.org/jira/browse/INFRA-12590 and ask for this
>
Thanks for debugging - I'll reply on
https://issues.apache.org/jira/browse/INFRA-12590 and ask for this
change.
FYI I don't any of the committers have access to the appveyor account
which is at https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark
. To request changes that need to be
I think this makes sense -- making it easier to use additional R
packages would be a good feature. I am not sure we need Packrat for
this use case though. Lets continue discussion on the JIRA at
https://issues.apache.org/jira/browse/SPARK-17428
Thanks
Shivaram
On Tue, Sep 6, 2016 at 11:36 PM,
I think it needs a type for the elements in the array. For example
f <- structField("x", "array")
Thanks
Shivaram
On Fri, Sep 2, 2016 at 8:26 AM, Paul R wrote:
> Hi there,
>
> I’ve noticed the following command in sparkR
>
field = structField(“x”, “array”)
>
> Throws
I think takeSample itself runs multiple jobs if the amount of samples
collected in the first pass is not enough. The comment and code path
at
https://github.com/apache/spark/blob/412b0e8969215411b97efd3d0984dc6cac5d31e0/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L508
should explain when
I think you can also pass in a zip file using the --files option
(http://spark.apache.org/docs/latest/running-on-yarn.html has some
examples). The files should then be present in the current working
directory of the driver R process.
Thanks
Shivaram
On Wed, Aug 17, 2016 at 4:16 AM, Felix Cheung
+1
SHA and MD5 sums match for all binaries. Docs look fine this time
around. Built and ran `dev/run-tests` with Java 7 on a linux machine.
No blocker bugs on JIRA and the only critical bug with target as 2.0.0
is SPARK-16633, which doesn't look like a release blocker. I also
checked issues which
Hashes, sigs match. I built and ran tests with Hadoop 2.3 ("-Pyarn
-Phadoop-2.3 -Phive -Pkinesis-asl -Phive-thriftserver"). I couldn't
get the following tests to pass but I think it might be something
specific to my setup as Jenkins on branch-2.0 seems quite stable.
[error] Failed tests:
[error]
I think the docs build was broken because of
https://issues.apache.org/jira/browse/SPARK-16553 - A fix has been
merged and we are testing it now
Shivaram
On Thu, Jul 14, 2016 at 1:56 PM, Matthias Niehoff
wrote:
> Some of the programming guides in the docs only
-sparkr-dev@googlegroups +dev@spark.apache.org
[Please send SparkR development questions to the Spark user / dev
mailing lists. Replies inline]
> From:
> Date: Tue, Jul 5, 2016 at 3:30 AM
> Subject: Call to new JObject sometimes returns an empty R environment
> To:
Can you open an issue on https://github.com/amplab/spark-ec2 ? I
think we should be able to escape the version string and pass the
2.0.0-preview through the scripts
Shivaram
On Tue, Jun 14, 2016 at 12:07 PM, Sunil Kumar
wrote:
> Hi,
>
> The spark-ec2 scripts are
As far as I know the process is just to copy docs/_site from the build
to the appropriate location in the SVN repo (i.e.
site/docs/2.0.0-preview).
Thanks
Shivaram
On Tue, Jun 7, 2016 at 8:14 AM, Sean Owen wrote:
> As a stop-gap, I can edit that page to have a small section
On Thu, May 12, 2016 at 2:29 PM, Reynold Xin wrote:
> We currently have three levels of interface annotation:
>
> - unannotated: stable public API
> - DeveloperApi: A lower-level, unstable API intended for developers.
> - Experimental: An experimental user-facing API.
>
>
>
I just ran the tests using a recently synced master branch and the
tests seemed to work fine. My guess is some of the Java classes
changed and you need to rebuild Spark ?
Thanks
Shivaram
On Thu, Apr 28, 2016 at 1:19 PM, Gayathri Murali
wrote:
> Hi All,
>
> I am
Overall this sounds good to me. One question I have is that in
addition to the ML algorithms we have a number of linear algebra
(various distributed matrices) and statistical methods in the
spark.mllib package. Is the plan to port or move these to the spark.ml
namespace in the 2.x series ?
Thanks
Yes - we should be running R tests AFAIK. That error message is a
deprecation warning about the script `bin/sparkR` which needs to be
changed in
https://github.com/apache/spark/blob/7cd7f2202547224593517b392f56e49e4c94cabc/R/run-tests.sh#L26
to bin/spark-submit.
Thanks
Shivaram
On Fri, Jan 15,
Ah I see. I wasn't aware of that PR. We should do a find and replace
in all the documentation and rest of the repository as well.
Shivaram
On Fri, Jan 15, 2016 at 3:20 PM, Reynold Xin wrote:
> +Shivaram
>
> Ah damn - we should fix it.
>
> This was broken by
The SparkR callJMethod can only invoke methods as they show up in the
Java byte code. So in this case you'll need to check the SparkContext
byte code (with javap or something like that) to see how that method
looks. My guess is the type is passed in as a class tag argument, so
you'll need to do
ne 54 still has SPARK_EC2_VERSION = "1.5.1"
>
> On Tue, Dec 1, 2015 at 12:22 AM, Shivaram Venkataraman
> <shiva...@eecs.berkeley.edu> wrote:
>>
>> Yeah we just need to add 1.5.2 as in
>>
>> https://github.com/apache/spark/commit/97956669053646f00131073358e53b05d0c3d5
Yeah we just need to add 1.5.2 as in
https://github.com/apache/spark/commit/97956669053646f00131073358e53b05d0c3d5d0#diff-ada66bbeb2f1327b508232ef6c3805a5
to the master branch as well
Thanks
Shivaram
On Mon, Nov 30, 2015 at 11:38 PM, Alexander Pivovarov
wrote:
> just
+1
On a related note I think making it lightweight will ensure that we
stay on the current release schedule and don't unnecessarily delay 2.0
to wait for new features / big architectural changes.
In terms of fixes to 1.x, I think our current policy of back-porting
fixes to older releases would
Thanks for investigating this. The right place to add these is the
core-site.xml template we have at
https://github.com/amplab/spark-ec2/blob/branch-1.5/templates/root/spark/conf/core-site.xml
and/or
I think that getting them from the ASF mirrors is a better strategy in
general as it'll remove the overhead of keeping the S3 bucket up to
date. It works in the spark-ec2 case because we only support a limited
number of Hadoop versions from the tool. FWIW I don't have write
access to the bucket
op-2.7.1.tar.gz?asjson=1
>
> Thanks for sharing that tip. Looks like you can also use as_json (vs.
> asjson).
>
> Nick
>
>
> On Sun, Nov 1, 2015 at 5:32 PM Shivaram Venkataraman
> <shiva...@eecs.berkeley.edu> wrote:
>>
>> On Sun, Nov 1, 2015 at 2:16 PM, Nic
et back a JSON which has a 'preferred' field set
to the closest mirror.
Shivaram
> Nick
>
>
> On Sun, Nov 1, 2015 at 12:25 PM Shivaram Venkataraman
> <shiva...@eecs.berkeley.edu> wrote:
>>
>> I think that getting them from the ASF mirrors is a better strategy in
>>
of
>> these folks have a Spark Cluster and wish to talk to it from RStudio. While
>> that is a bigger task, for now, first step could be not requiring them to
>> download Spark source and run a script that is named install-dev.sh. I filed
>> SPARK-10776 to track this.
>>
>>
>>
As Rui says it would be good to understand the use case we want to
support (supporting CRAN installs could be one for example). I don't
think it should be very hard to do as the RBackend itself doesn't use
the R source files. The RRDD does use it and the value comes from
I think Hao posted a link to the source code in the description of
https://issues.apache.org/jira/browse/SPARK-6803
On Wed, Sep 16, 2015 at 10:06 AM, Reynold Xin wrote:
> You should reach out to the speakers directly.
>
>
> On Wed, Sep 16, 2015 at 9:52 AM, Renyi Xiong
line arguments
> from spark-submit and setting them with SparkConf to R diver's in-process
> JVM through JNI?
>
> On Thu, Sep 10, 2015 at 9:29 PM, Shivaram Venkataraman
> <shiva...@eecs.berkeley.edu> wrote:
>>
>> Yeah in addition to the downside of having 2 JVMs
FYI
The staging repository published as version 1.5.0 is at
https://repository.apache.org/content/repositories/orgapachespark-1136
while the staging repository published as version 1.5.0-rc1 is at
https://repository.apache.org/content/repositories/orgapachespark-1137
Thanks
Shivaram
On Thu, Aug
-thoughts.com wrote:
thx for this, let me know if you need help
2015-08-16 23:38 GMT+02:00 Shivaram Venkataraman
shiva...@eecs.berkeley.edu:
I just investigated this and this is happening because of a Maven
version requirement not being met. I'll look at modifying the build
scripts to use Maven
I just investigated this and this is happening because of a Maven
version requirement not being met. I'll look at modifying the build
scripts to use Maven 3.3.3 (with build/mvn --force ?)
Shivaram
On Sun, Aug 16, 2015 at 10:16 AM, Olivier Girardot
o.girar...@lateral-thoughts.com wrote:
Hi
Thanks for the catch. Could you send a PR with this diff ?
On Fri, Aug 14, 2015 at 10:30 AM, Shkurenko, Alex ashkure...@enova.com wrote:
Got an issue similar to https://issues.apache.org/jira/browse/SPARK-8897,
but with the Decimal datatype coming from a Postgres DB:
//Set up SparkR
The in-process JNI only works out when the R process comes up first
and we launch a JVM inside it. In many deploy modes like YARN (or
actually in anything using spark-submit) the JVM comes up first and we
launch R after that. Using an inter-process solution helps us cover
both use cases
Thanks
PythonRDD.scala has a number of PySpark specific conventions (for
example worker reuse, exceptions etc.) and PySpark specific protocols
(e.g. for communicating accumulators, broadcasts between the JVM and
Python etc.). While it might be possible to refactor the two classes
to share some more code
I sent a note to the Mesos developers and created
https://github.com/apache/spark/pull/7899 to change the repository
pointer. There are 3-4 open PRs right now in the mesos/spark-ec2
repository and I'll work on migrating them to amplab/spark-ec2 later
today.
My thoughts on moving the python script
Hi Mesos developers
The Apache Spark project has been hosting using
https://github.com/mesos/spark-ec2 as a supporting repository for some
of our EC2 scripts. This is a remnant from the days when the Spark
project itself was hosted at github.com/mesos/spark. Based on
discussions in the Spark
? it feels like something that would be
good to do before 1.5.0, if it's going to happen soon.
On Wed, Jul 22, 2015 at 6:59 AM, Shivaram Venkataraman
shiva...@eecs.berkeley.edu wrote:
Yeah I'll send a note to the mesos dev list just to make sure they are
informed.
Shivaram
On Tue, Jul 21
assume (atleast part of it) is owned by mesos project and so
its PMC ?
- Mridul
On Tue, Jul 21, 2015 at 9:22 AM, Shivaram Venkataraman
shiva...@eecs.berkeley.edu wrote:
There is technically no PMC for the spark-ec2 project (I guess we are
kind
of establishing one right now). I haven't
to
prevent future issues with apache.
Regards,
Mridul
On Mon, Jul 20, 2015 at 12:01 PM, Shivaram Venkataraman
shiva...@eecs.berkeley.edu wrote:
I've created https://github.com/amplab/spark-ec2 and added an initial
set of
committers. Note that this is not a fork of the existing
github.com
, Mridul Muralidharan mri...@gmail.com
wrote:
If I am not wrong, since the code was hosted within mesos project
repo, I assume (atleast part of it) is owned by mesos project and so
its PMC ?
- Mridul
On Tue, Jul 21, 2015 at 9:22 AM, Shivaram Venkataraman
shiva...@eecs.berkeley.edu wrote
be migrating some PRs / closing them in the old repo and will also
update the README in that repo.
Thanks
Shivaram
On Fri, Jul 17, 2015 at 3:00 PM, Sean Owen so...@cloudera.com wrote:
On Fri, Jul 17, 2015 at 6:58 PM, Shivaram Venkataraman
shiva...@eecs.berkeley.edu wrote:
I am not sure why the ASF JIRA
+= (System.nanoTime() - t) / 1e9
oldRDD = newRDD
i += 1
}
println(Avg iteration time: + avgTime / numIterations)
Best regards, Alexander
*From:* Shivaram Venkataraman [mailto:shiva...@eecs.berkeley.edu
shiva...@eecs.berkeley.edu]
*Sent:* Friday, July 10, 2015 10:04 PM
Some replies inline
On Wed, Jul 15, 2015 at 1:08 AM, Sean Owen so...@cloudera.com wrote:
The code can continue to be a good reference implementation, no matter
where it lives. In fact, it can be a better more complete one, and
easier to update.
I agree that ec2/ needs to retain some kind of
Both SparkR and the PySpark API call into the JVM Spark API (i.e.
JavaSparkContext, JavaRDD etc.). They use different methods (Py4J vs. the
R-Java bridge) to call into the JVM based on libraries available / features
supported in each language. So for Haskell, one would need to see what is
the best
I think moving the repo-location and re-organizing the python code to
handle dependencies, testing etc. sounds good to me. However, I think there
are a couple of things which I am not sure about
1. I strongly believe that we should preserve existing command-line in
ec2/spark-ec2 (i.e. the shell
I think you need to do `newRDD.cache()` and `newRDD.count` before you do
oldRDD.unpersist(true) -- Otherwise it might be recomputing all the
previous iterations each time.
Thanks
Shivaram
On Fri, Jul 10, 2015 at 7:44 PM, Ulanov, Alexander alexander.ula...@hp.com
wrote:
Hi,
I am interested
...@hp.com
wrote:
Hi Shivaram,
Thank you for suggestion! If I do .cache and .count, each iteration take
much more time, which is spent in GC. Is it normal?
10 июля 2015 г., в 21:23, Shivaram Venkataraman
shiva...@eecs.berkeley.edumailto:shiva...@eecs.berkeley.edu написал(а):
I think you
The R and Python implementations differ in how they communicate with the
JVM so there is no invariant there per-se.
Thanks
Shivaram
On Thu, Jul 9, 2015 at 10:40 PM, Vasili I. Galchin vigalc...@gmail.com
wrote:
Hello,
Just trying to get up to speed ( a week .. pls be patient with me).
1 - 100 of 149 matches
Mail list logo