Re: [VOTE] Update the committer guidelines to clarify when to commit changes.

2020-07-31 Thread Shivaram Venkataraman
+1 Thanks Shivaram On Thu, Jul 30, 2020 at 11:56 PM Wenchen Fan wrote: > > +1, thanks for driving it, Holden! > > On Fri, Jul 31, 2020 at 10:24 AM Holden Karau wrote: >> >> +1 from myself :) >> >> On Thu, Jul 30, 2020 at 2:53 PM Jungtaek Lim >> wrote: >>> >>> +1 (non-binding, I guess) >>>

Re: [DISCUSS] Apache Spark 3.0.1 Release

2020-07-14 Thread Shivaram Venkataraman
Hi all Just wanted to check if there are any blockers that we are still waiting for to start the new release process. Thanks Shivaram On Sun, Jul 5, 2020, 06:51 wuyi wrote: > Ok, after having another look, I think it only affects local cluster deploy > mode, which is for testing only. > > >

Re: [DISCUSS] Apache Spark 3.0.1 Release

2020-07-01 Thread Shivaram Venkataraman
;> https://issues.apache.org/jira/browse/SPARK-32136 >> >> >> >> Thanks, >> >> Jason. >> >> >> >> From: Jungtaek Lim >> Date: Wednesday, 1 July 2020 at 10:20 am >> To: Shivaram Venkataraman >> Cc: Prashant Shar

Re: [DISCUSS] Apache Spark 3.0.1 Release

2020-06-30 Thread Shivaram Venkataraman
quot;;"Jungtaek Lim"< >> kabhwan.opensou...@gmail.com>;"Jules Damji";"Holden >> Karau";"Reynold Xin";"Shivaram >> Venkataraman";"Yuanjian Li"< >> xyliyuanj...@gmail.com>;"Spark dev list";&

Re: [DISCUSS] Apache Spark 3.0.1 Release

2020-06-23 Thread Shivaram Venkataraman
+1 Thanks Yuanjian -- I think it'll be great to have a 3.0.1 release soon. Shivaram On Tue, Jun 23, 2020 at 3:43 AM Takeshi Yamamuro wrote: > > Thanks for the heads-up, Yuanjian! > > > I also noticed branch-3.0 already has 39 commits after Spark 3.0.0. > wow, the updates are so quick. Anyway,

Re: SparkR latest API docs missing?

2019-05-08 Thread Shivaram Venkataraman
, 2019 at 11:27 AM Shivaram Venkataraman wrote: > > Actually I found this while I was uploading the latest release to CRAN > -- these docs should be generated as a part of the release process > though and shouldn't be related to CRAN. > > On Wed, May 8, 2019 at 11:24 AM Sean Owen

Re: SparkR latest API docs missing?

2019-05-08 Thread Shivaram Venkataraman
it due to the > additional CRAN processes. > > On Wed, May 8, 2019 at 11:23 AM Shivaram Venkataraman > wrote: > > > > I just noticed that the SparkR API docs are missing at > > https://spark.apache.org/docs/latest/api/R/index.html --- It looks > > like they were miss

SparkR latest API docs missing?

2019-05-08 Thread Shivaram Venkataraman
I just noticed that the SparkR API docs are missing at https://spark.apache.org/docs/latest/api/R/index.html --- It looks like they were missing from the 2.4.3 release? Thanks Shivaram - To unsubscribe e-mail:

Fwd: CRAN submission SparkR 2.3.3

2019-02-24 Thread Shivaram Venkataraman
is that if there have not been too many changes since 2.3.3, how much effort would it be to cut a 2.3.4 with just this change. Thanks Shivaram -- Forwarded message - From: Uwe Ligges Date: Sun, Feb 17, 2019 at 12:28 PM Subject: Re: CRAN submission SparkR 2.3.3 To: Shivaram Venkataraman , CRAN

Re: Vectorized R gapply[Collect]() implementation

2019-02-09 Thread Shivaram Venkataraman
Those speedups look awesome! Great work Hyukjin! Thanks Shivaram On Sat, Feb 9, 2019 at 7:41 AM Hyukjin Kwon wrote: > > Guys, as continuation of Arrow optimization for R DataFrame to Spark > DataFrame, > > I am trying to make a vectorized gapply[Collect] implementation as an > experiment like

Re: Arrow optimization in conversion from R DataFrame to Spark DataFrame

2018-11-09 Thread Shivaram Venkataraman
Thanks Hyukjin! Very cool results Shivaram On Fri, Nov 9, 2018 at 10:58 AM Felix Cheung wrote: > > Very cool! > > > > From: Hyukjin Kwon > Sent: Thursday, November 8, 2018 10:29 AM > To: dev > Subject: Arrow optimization in conversion from R DataFrame to Spark

Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0

2018-11-07 Thread Shivaram Venkataraman
> From: Sean Owen > Sent: Tuesday, November 6, 2018 10:51 AM > To: Shivaram Venkataraman > Cc: Felix Cheung; Wenchen Fan; Matei Zaharia; dev > Subject: Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0 > > I think the second option, to skip the tests, is best right now, if

Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0

2018-11-06 Thread Shivaram Venkataraman
elease of 2.4.0 > > > > From: Wenchen Fan > Sent: Tuesday, November 6, 2018 8:51 AM > To: Felix Cheung > Cc: Matei Zaharia; Sean Owen; Spark dev list; Shivaram Venkataraman > Subject: Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0 > > Do you mea

Re: Removing non-deprecated R methods that were deprecated in Python, Scala?

2018-11-06 Thread Shivaram Venkataraman
Yep. That sounds good to me. On Tue, Nov 6, 2018 at 11:06 AM Sean Owen wrote: > > Sounds good, remove in 3.1? I can update accordingly. > > On Tue, Nov 6, 2018, 10:46 AM Reynold Xin > >> Maybe deprecate and remove in next version? It is bad to just remove a >> method without deprecation notice.

Re: [R] discuss: removing lint-r checks for old branches

2018-08-10 Thread Shivaram Venkataraman
Sounds good to me as well. Thanks Shane. Shivaram On Fri, Aug 10, 2018 at 1:40 PM Reynold Xin wrote: > > SGTM > > On Fri, Aug 10, 2018 at 1:39 PM shane knapp wrote: >> >> https://issues.apache.org/jira/browse/SPARK-25089 >> >> basically since these branches are old, and there will be a greater

Re: [CRAN-pretest-archived] CRAN submission SparkR 2.2.2

2018-07-09 Thread Shivaram Venkataraman
m > > On Monday, July 9, 2018, 4:50:18 PM CDT, Shivaram Venkataraman > wrote: > > > Yes. I think Felix checked in a fix to ignore tests run on java > versions that are not Java 8 (I think the fix was in > https://github.com/apache/spark/pull/21666 which is in 2.3.2) > &

Re: [CRAN-pretest-archived] CRAN submission SparkR 2.2.2

2018-07-09 Thread Shivaram Venkataraman
Java 9. Spark doesn't > support that. Is there any way to tell CRAN this should not be tested? > > On Mon, Jul 9, 2018, 4:17 PM Shivaram Venkataraman > wrote: >> >> The upcoming 2.2.2 release was submitted to CRAN. I think there are >> some knows issues on

Fwd: [CRAN-pretest-archived] CRAN submission SparkR 2.2.2

2018-07-09 Thread Shivaram Venkataraman
rvice Flavor: r-devel-linux-x86_64-debian-gcc, r-devel-windows-ix86+x86_64 Check: CRAN incoming feasibility, Result: WARNING Maintainer: 'Shivaram Venkataraman ' New submission Package was archived on CRAN Insufficient package version (submitted: 2.2.2, existing: 2.3.0) Possibly mis-spe

Re: [CRAN-pretest-archived] CRAN submission SparkR 2.3.1

2018-06-12 Thread Shivaram Venkataraman
e Oracle JDK? > > ____ > From: Shivaram Venkataraman > Sent: Tuesday, June 12, 2018 3:17:52 PM > To: dev > Cc: Felix Cheung > Subject: Fwd: [CRAN-pretest-archived] CRAN submission SparkR 2.3.1 > > Corresponding to the Spark 2.3.1 release, I submitted the SparkR build > to C

Fwd: [CRAN-pretest-archived] CRAN submission SparkR 2.3.1

2018-06-12 Thread Shivaram Venkataraman
evel-windows-ix86+x86_64 Check: CRAN incoming feasibility, Result: NOTE Maintainer: 'Shivaram Venkataraman ' New submission Package was archived on CRAN Possibly mis-spelled words in DESCRIPTION: Frontend (4:10, 5:28) CRAN repository db overrides: X-CRAN-Comment: Archived

Re: [VOTE] SPIP ML Pipelines in R

2018-05-31 Thread Shivaram Venkataraman
Hossein -- Can you clarify what the resolution on the repository / release issue discussed on SPIP ? Shivaram On Thu, May 31, 2018 at 9:06 AM, Felix Cheung wrote: > +1 > With my concerns in the SPIP discussion. > > > From: Hossein > Sent: Wednesday, May 30,

Re: SparkR was removed from CRAN on 2018-05-01

2018-05-29 Thread Shivaram Venkataraman
51 > > On Tue, May 29, 2018 at 1:52 PM, Shivaram Venkataraman > wrote: >> >> Yes. That is correct >> >> Shivaram >> >> On Tue, May 29, 2018 at 11:48 AM, Hossein wrote: >> > I guess this relates to our conversation on the SPIP. When this happe

Re: SparkR was removed from CRAN on 2018-05-01

2018-05-29 Thread Shivaram Venkataraman
Yes. That is correct Shivaram On Tue, May 29, 2018 at 11:48 AM, Hossein wrote: > I guess this relates to our conversation on the SPIP. When this happens, do > we wait for a new minor release to submit it to CRAN again? > > --Hossein > > On Fri, May 25, 2018 at 5:11 PM, Felix Cheung > wrote:

Re: Time for 2.3.1?

2018-05-13 Thread Shivaram Venkataraman
+1 We had a SparkR fix for CRAN SystemRequirements that will also be good to get out. Shivaram On Fri, May 11, 2018 at 12:34 PM, Henry Robinson wrote: > https://github.com/apache/spark/pull/21302 > > On 11 May 2018 at 11:47, Henry Robinson wrote: > >> I was

Re: Integrating ML/DL frameworks with Spark

2018-05-08 Thread Shivaram Venkataraman
> > > >>- Fault tolerance and execution model: Spark assumes fine-grained >>task recovery, i.e. if something fails, only that task is rerun. This >>doesn’t match the execution model of distributed ML/DL frameworks that are >>typically MPI-based, and rerunning a single task would

Re: [Spark][Scheduler] Spark DAGScheduler scheduling performance hindered on JobSubmitted Event

2018-03-06 Thread Shivaram Venkataraman
The problem with doing work in the callsite thread is that there are a number of data structures that are updated during job submission and these data structures are guarded by the event loop ensuring only one thread accesses them. I dont think there is a very easy fix for this given the

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-19 Thread Shivaram Venkataraman
t should not be in the release) > > Thanks! > > _ > From: Shivaram Venkataraman <shiva...@eecs.berkeley.edu> > Sent: Tuesday, February 20, 2018 2:24 AM > Subject: Re: [VOTE] Spark 2.3.0 (RC4) > To: Felix Cheung <felixcheun...@hotmail.com> > Cc: Sean Owen &

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-19 Thread Shivaram Venkataraman
FWIW The search result link works for me Shivaram On Mon, Feb 19, 2018 at 6:21 PM, Felix Cheung wrote: > These are two separate things: > > Does the search result links work for you? > > The second is the dist location we are voting on has a .iml file. > >

Re: [RESULT][VOTE] Spark 2.2.1 (RC2)

2017-12-13 Thread Shivaram Venkataraman
Cheung <felixche...@apache.org> >>> wrote: >>> >>>> This vote passes. Thanks everyone for testing this release. >>>> >>>> >>>> +1: >>>> >>>> Sean Owen (binding) >>>> >>>> Herman van Hö

Re: [VOTE] Spark 2.2.1 (RC2)

2017-11-29 Thread Shivaram Venkataraman
+1 SHA, MD5 and signatures look fine. Built and ran Maven tests on my Macbook. Thanks Shivaram On Wed, Nov 29, 2017 at 10:43 AM, Holden Karau wrote: > +1 (non-binding) > > PySpark install into a virtualenv works, PKG-INFO looks correctly > populated (mostly checking for

Re: What is d3kbcqa49mib13.cloudfront.net ?

2017-09-13 Thread Shivaram Venkataraman
repositories. Incorporating something that is not completely > trusted or approved into the process of building something that we are then > going to approve as trusted is different from the prior use of cloudfront. > > On Wed, Sep 13, 2017 at 10:26 AM, Shivaram Venkataram

Re: What is d3kbcqa49mib13.cloudfront.net ?

2017-09-13 Thread Shivaram Venkataraman
The bucket comes from Cloudfront, a CDN thats part of AWS. There was a bunch of discussion about this back in 2013 https://lists.apache.org/thread.html/9a72ff7ce913dd85a6b112b1b2de536dcda74b28b050f70646aba0ac@1380147885@%3Cdev.spark.apache.org%3E Shivaram On Wed, Sep 13, 2017 at 9:30 AM, Sean

Submitting SparkR to CRAN

2017-05-09 Thread Shivaram Venkataraman
Closely related to the PyPi upload thread (https://s.apache.org/WLtM), I just wanted to give a heads up that we are working on submitting SparkR from Spark 2.1.1 as a package to CRAN. The package submission is under review with CRAN right now and I will post updates to this thread. The main

Re: Build completed: spark 866-master

2017-03-04 Thread Shivaram Venkataraman
ttps://www.appveyor.com/ >> docs/notifications/#global-email-notifications). >> >> > Warning: Notifications defined on project settings UI are merged with >> notifications defined in appveyor.yml. >> >> Should we maybe an INFRA JIRA to check and ask this? >> >> >>

Fwd: Build completed: spark 866-master

2017-03-04 Thread Shivaram Venkataraman
I'm not sure why the AppVeyor updates are coming to the dev list. Hyukjin -- Do you know if we made any recent changes that might have caused this ? Thanks Shivaram -- Forwarded message -- From: AppVeyor Date: Sat, Mar 4, 2017 at 2:46 PM Subject: Build

Re: Can anyone edit JIRAs SPARK-19191 to SPARK-19202?

2017-01-13 Thread Shivaram Venkataraman
FWIW there is an option to Delete the issue (in More -> Delete). Shivaram On Fri, Jan 13, 2017 at 8:11 AM, Shivaram Venkataraman <shiva...@eecs.berkeley.edu> wrote: > I can't see the resolve button either - Maybe we can forward this to > Apache Infra and see if they can clo

Re: Can anyone edit JIRAs SPARK-19191 to SPARK-19202?

2017-01-13 Thread Shivaram Venkataraman
I can't see the resolve button either - Maybe we can forward this to Apache Infra and see if they can close these issues ? Shivaram On Fri, Jan 13, 2017 at 6:35 AM, Sean Owen wrote: > Yes, I'm asking about a specific range: 19191 - 19202. These seem to be the > ones created

Re: [VOTE] Apache Spark 2.1.0 (RC5)

2016-12-15 Thread Shivaram Venkataraman
In addition to usual binary artifacts, this is the first release where we have installable packages for Python [1] and R [2] that are part of the release. I'm including instructions to test the R package below. Holden / other Python developers can chime in if there are special instructions to

Re: [VOTE] Apache Spark 2.1.0 (RC2)

2016-12-08 Thread Shivaram Venkataraman
+0 I am not sure how much of a problem this is but the pip packaging seems to have changed the size of the hadoop-2.7 artifact. As you can see in http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc2-bin/, the Hadoop 2.7 build is 359M almost double the size of the other Hadoop

Re: [ANNOUNCE] Apache Spark 2.0.2

2016-11-14 Thread Shivaram Venkataraman
FWIW 2.0.1 is also used in the 'Link With Spark' and 'Spark Source Code Management' sections in that page. Shivaram On Mon, Nov 14, 2016 at 11:11 PM, Reynold Xin wrote: > It's on there on the page (both the release notes and the download version > dropdown). > > The one

Re: [VOTE] Release Apache Spark 2.0.2 (RC3)

2016-11-14 Thread Shivaram Venkataraman
The release is available on http://www.apache.org/dist/spark/ and its on Maven central http://repo1.maven.org/maven2/org/apache/spark/spark-core_2.11/2.0.2/ I guess Reynold hasn't yet put together the release notes / updates to the website. Thanks Shivaram On Mon, Nov 14, 2016 at 12:49 PM,

Re: statistics collection and propagation for cost-based optimizer

2016-11-14 Thread Shivaram Venkataraman
Do we have any query workloads for which we can benchmark these proposals in terms of performance ? Thanks Shivaram On Sun, Nov 13, 2016 at 5:53 PM, Reynold Xin wrote: > One additional note: in terms of size, the size of a count-min sketch with > eps = 0.1% and confidence

Re: StructuredStreaming status

2016-10-19 Thread Shivaram Venkataraman
At the AMPLab we've been working on a research project that looks at just the scheduling latencies and on techniques to get lower scheduling latency. It moves away from the micro-batch model, but reuses the fault tolerance etc. in Spark. However we haven't yet figure out all the parts in

Re: Mini-Proposal: Make it easier to contribute to the contributing to Spark Guide

2016-10-18 Thread Shivaram Venkataraman
+1 - Given that our website is now on github (https://github.com/apache/spark-website), I think we can move most of our wiki into the main website. That way we'll only have two sources of documentation to maintain: A release specific one in the main repo and the website which is more long lived.

Re: StructuredStreaming Custom Sinks (motivated by Structured Streaming Machine Learning)

2016-10-13 Thread Shivaram Venkataraman
on isolating specific changes that are required etc. It'd also be great to hear other approaches / next steps to concretize some of these goals. Thanks Shivaram On Thu, Oct 13, 2016 at 8:39 AM, Fred Reiss <freiss@gmail.com> wrote: > On Tue, Oct 11, 2016 at 11:02 AM, Shivaram Venkataraman

Re: StructuredStreaming Custom Sinks (motivated by Structured Streaming Machine Learning)

2016-10-11 Thread Shivaram Venkataraman
Thanks Fred - that is very helpful. > Delivering low latency, high throughput, and stability simultaneously: Right > now, our own tests indicate you can get at most two of these characteristics > out of Spark Streaming at the same time. I know of two parties that have > abandoned Spark Streaming

Re: [ANNOUNCE] Announcing Spark 2.0.1

2016-10-05 Thread Shivaram Venkataraman
Yeah I see the apache maven repos have the 2.0.1 artifacts at https://repository.apache.org/content/repositories/releases/org/apache/spark/spark-core_2.11/ -- Not sure why they haven't synced to maven central yet Shivaram On Wed, Oct 5, 2016 at 8:37 PM, Luciano Resende

Re: [discuss] Spark 2.x release cadence

2016-09-27 Thread Shivaram Venkataraman
+1 I think having a 4 month window instead of a 3 month window sounds good. However I think figuring out a timeline for maintenance releases would also be good. This is a common concern that comes up in many user threads and it'll be better to have some structure around this. It doesn't need to

Re: StructuredStreaming Custom Sinks (motivated by Structured Streaming Machine Learning)

2016-09-26 Thread Shivaram Venkataraman
Disclaimer - I am not very closely involved with Structured Streaming design / development, so this is just my two cents from looking at the discussion in the linked JIRAs and PRs. It seems to me there are a couple of issues being conflated here: (a) is the question of how to specify or add more

Re: R docs no longer building for branch-2.0

2016-09-22 Thread Shivaram Venkataraman
I looked into this and found the problem. Will send a PR now to fix this. If you are curious about what is happening here: When we build the docs separately we don't have the JAR files from the Spark build in the same tree. We added a new set of docs recently in SparkR called an R vignette that

Re: Change the settings in AppVeyor to prevent triggering the tests in other PRs in other branches

2016-09-09 Thread Shivaram Venkataraman
s to other branches on branch-1.5 and lower >> versions, I think it'd be fine. >> >> One concern is, I am not sure if SparkR tests can pass on branch-1.6 (I >> checked it passes on branch-2.0 before). >> >> I can try to check if it passes and identify the related caus

Re: Change the settings in AppVeyor to prevent triggering the tests in other PRs in other branches

2016-09-09 Thread Shivaram Venkataraman
out to who's in charge of the account :). > > > On 10 Sep 2016 12:41 a.m., "Shivaram Venkataraman" > <shiva...@eecs.berkeley.edu> wrote: >> >> Thanks for debugging - I'll reply on >> https://issues.apache.org/jira/browse/INFRA-12590 and ask for this >

Re: Change the settings in AppVeyor to prevent triggering the tests in other PRs in other branches

2016-09-09 Thread Shivaram Venkataraman
Thanks for debugging - I'll reply on https://issues.apache.org/jira/browse/INFRA-12590 and ask for this change. FYI I don't any of the committers have access to the appveyor account which is at https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark . To request changes that need to be

Re: Discuss SparkR executors/workers support virtualenv

2016-09-07 Thread Shivaram Venkataraman
I think this makes sense -- making it easier to use additional R packages would be a good feature. I am not sure we need Packrat for this use case though. Lets continue discussion on the JIRA at https://issues.apache.org/jira/browse/SPARK-17428 Thanks Shivaram On Tue, Sep 6, 2016 at 11:36 PM,

Re: sparkR array type not supported

2016-09-02 Thread Shivaram Venkataraman
I think it needs a type for the elements in the array. For example f <- structField("x", "array") Thanks Shivaram On Fri, Sep 2, 2016 at 8:26 AM, Paul R wrote: > Hi there, > > I’ve noticed the following command in sparkR > field = structField(“x”, “array”) > > Throws

Re: KMeans calls takeSample() twice?

2016-08-30 Thread Shivaram Venkataraman
I think takeSample itself runs multiple jobs if the amount of samples collected in the first pass is not enough. The comment and code path at https://github.com/apache/spark/blob/412b0e8969215411b97efd3d0984dc6cac5d31e0/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L508 should explain when

Re: Spark R - Loading Third Party R Library in YARN Executors

2016-08-17 Thread Shivaram Venkataraman
I think you can also pass in a zip file using the --files option (http://spark.apache.org/docs/latest/running-on-yarn.html has some examples). The files should then be present in the current working directory of the driver R process. Thanks Shivaram On Wed, Aug 17, 2016 at 4:16 AM, Felix Cheung

Re: [VOTE] Release Apache Spark 2.0.0 (RC5)

2016-07-20 Thread Shivaram Venkataraman
+1 SHA and MD5 sums match for all binaries. Docs look fine this time around. Built and ran `dev/run-tests` with Java 7 on a linux machine. No blocker bugs on JIRA and the only critical bug with target as 2.0.0 is SPARK-16633, which doesn't look like a release blocker. I also checked issues which

Re: [VOTE] Release Apache Spark 2.0.0 (RC4)

2016-07-15 Thread Shivaram Venkataraman
Hashes, sigs match. I built and ran tests with Hadoop 2.3 ("-Pyarn -Phadoop-2.3 -Phive -Pkinesis-asl -Phive-thriftserver"). I couldn't get the following tests to pass but I think it might be something specific to my setup as Jenkins on branch-2.0 seems quite stable. [error] Failed tests: [error]

Re: [VOTE] Release Apache Spark 2.0.0 (RC4)

2016-07-14 Thread Shivaram Venkataraman
I think the docs build was broken because of https://issues.apache.org/jira/browse/SPARK-16553 - A fix has been merged and we are testing it now Shivaram On Thu, Jul 14, 2016 at 1:56 PM, Matthias Niehoff wrote: > Some of the programming guides in the docs only

Re: Call to new JObject sometimes returns an empty R environment

2016-07-05 Thread Shivaram Venkataraman
-sparkr-dev@googlegroups +dev@spark.apache.org [Please send SparkR development questions to the Spark user / dev mailing lists. Replies inline] > From: > Date: Tue, Jul 5, 2016 at 3:30 AM > Subject: Call to new JObject sometimes returns an empty R environment > To:

Re: spark-ec2 scripts with spark-2.0.0-preview

2016-06-14 Thread Shivaram Venkataraman
Can you open an issue on https://github.com/amplab/spark-ec2 ? I think we should be able to escape the version string and pass the 2.0.0-preview through the scripts Shivaram On Tue, Jun 14, 2016 at 12:07 PM, Sunil Kumar wrote: > Hi, > > The spark-ec2 scripts are

Re: Spark 2.0.0-preview artifacts still not available in Maven

2016-06-07 Thread Shivaram Venkataraman
As far as I know the process is just to copy docs/_site from the build to the appropriate location in the SVN repo (i.e. site/docs/2.0.0-preview). Thanks Shivaram On Tue, Jun 7, 2016 at 8:14 AM, Sean Owen wrote: > As a stop-gap, I can edit that page to have a small section

Re: [discuss] separate API annotation into two components: InterfaceAudience & InterfaceStability

2016-05-12 Thread Shivaram Venkataraman
On Thu, May 12, 2016 at 2:29 PM, Reynold Xin wrote: > We currently have three levels of interface annotation: > > - unannotated: stable public API > - DeveloperApi: A lower-level, unstable API intended for developers. > - Experimental: An experimental user-facing API. > > >

Re: SparkR unit test failures on local master

2016-04-28 Thread Shivaram Venkataraman
I just ran the tests using a recently synced master branch and the tests seemed to work fine. My guess is some of the Java classes changed and you need to rebuild Spark ? Thanks Shivaram On Thu, Apr 28, 2016 at 1:19 PM, Gayathri Murali wrote: > Hi All, > > I am

Re: Switch RDD-based MLlib APIs to maintenance mode in Spark 2.0

2016-04-05 Thread Shivaram Venkataraman
Overall this sounds good to me. One question I have is that in addition to the ML algorithms we have a number of linear algebra (various distributed matrices) and statistical methods in the spark.mllib package. Is the plan to port or move these to the spark.ml namespace in the 2.x series ? Thanks

Re: Are we running SparkR tests in Jenkins?

2016-01-15 Thread Shivaram Venkataraman
Yes - we should be running R tests AFAIK. That error message is a deprecation warning about the script `bin/sparkR` which needs to be changed in https://github.com/apache/spark/blob/7cd7f2202547224593517b392f56e49e4c94cabc/R/run-tests.sh#L26 to bin/spark-submit. Thanks Shivaram On Fri, Jan 15,

Re: Are we running SparkR tests in Jenkins?

2016-01-15 Thread Shivaram Venkataraman
Ah I see. I wasn't aware of that PR. We should do a find and replace in all the documentation and rest of the repository as well. Shivaram On Fri, Jan 15, 2016 at 3:20 PM, Reynold Xin wrote: > +Shivaram > > Ah damn - we should fix it. > > This was broken by

Re: Specifying Scala types when calling methods from SparkR

2015-12-09 Thread Shivaram Venkataraman
The SparkR callJMethod can only invoke methods as they show up in the Java byte code. So in this case you'll need to check the SparkContext byte code (with javap or something like that) to see how that method looks. My guess is the type is passed in as a class tag argument, so you'll need to do

Re: How to add 1.5.2 support to ec2/spark_ec2.py ?

2015-12-01 Thread Shivaram Venkataraman
ne 54 still has SPARK_EC2_VERSION = "1.5.1" > > On Tue, Dec 1, 2015 at 12:22 AM, Shivaram Venkataraman > <shiva...@eecs.berkeley.edu> wrote: >> >> Yeah we just need to add 1.5.2 as in >> >> https://github.com/apache/spark/commit/97956669053646f00131073358e53b05d0c3d5

Re: How to add 1.5.2 support to ec2/spark_ec2.py ?

2015-12-01 Thread Shivaram Venkataraman
Yeah we just need to add 1.5.2 as in https://github.com/apache/spark/commit/97956669053646f00131073358e53b05d0c3d5d0#diff-ada66bbeb2f1327b508232ef6c3805a5 to the master branch as well Thanks Shivaram On Mon, Nov 30, 2015 at 11:38 PM, Alexander Pivovarov wrote: > just

Re: A proposal for Spark 2.0

2015-11-10 Thread Shivaram Venkataraman
+1 On a related note I think making it lightweight will ensure that we stay on the current release schedule and don't unnecessarily delay 2.0 to wait for new features / big architectural changes. In terms of fixes to 1.x, I think our current policy of back-porting fixes to older releases would

Re: Recommended change to core-site.xml template

2015-11-05 Thread Shivaram Venkataraman
Thanks for investigating this. The right place to add these is the core-site.xml template we have at https://github.com/amplab/spark-ec2/blob/branch-1.5/templates/root/spark/conf/core-site.xml and/or

Re: Downloading Hadoop from s3://spark-related-packages/

2015-11-01 Thread Shivaram Venkataraman
I think that getting them from the ASF mirrors is a better strategy in general as it'll remove the overhead of keeping the S3 bucket up to date. It works in the spark-ec2 case because we only support a limited number of Hadoop versions from the tool. FWIW I don't have write access to the bucket

Re: Downloading Hadoop from s3://spark-related-packages/

2015-11-01 Thread Shivaram Venkataraman
op-2.7.1.tar.gz?asjson=1 > > Thanks for sharing that tip. Looks like you can also use as_json (vs. > asjson). > > Nick > > > On Sun, Nov 1, 2015 at 5:32 PM Shivaram Venkataraman > <shiva...@eecs.berkeley.edu> wrote: >> >> On Sun, Nov 1, 2015 at 2:16 PM, Nic

Re: Downloading Hadoop from s3://spark-related-packages/

2015-11-01 Thread Shivaram Venkataraman
et back a JSON which has a 'preferred' field set to the closest mirror. Shivaram > Nick > > > On Sun, Nov 1, 2015 at 12:25 PM Shivaram Venkataraman > <shiva...@eecs.berkeley.edu> wrote: >> >> I think that getting them from the ASF mirrors is a better strategy in >>

Re: SparkR package path

2015-09-24 Thread Shivaram Venkataraman
of >> these folks have a Spark Cluster and wish to talk to it from RStudio. While >> that is a bigger task, for now, first step could be not requiring them to >> download Spark source and run a script that is named install-dev.sh. I filed >> SPARK-10776 to track this. >> >> >>

Re: SparkR package path

2015-09-22 Thread Shivaram Venkataraman
As Rui says it would be good to understand the use case we want to support (supporting CRAN installs could be one for example). I don't think it should be very hard to do as the RBackend itself doesn't use the R source files. The RRDD does use it and the value comes from

Re: SparkR streaming source code

2015-09-16 Thread Shivaram Venkataraman
I think Hao posted a link to the source code in the description of https://issues.apache.org/jira/browse/SPARK-6803 On Wed, Sep 16, 2015 at 10:06 AM, Reynold Xin wrote: > You should reach out to the speakers directly. > > > On Wed, Sep 16, 2015 at 9:52 AM, Renyi Xiong

Re: SparkR driver side JNI

2015-09-11 Thread Shivaram Venkataraman
line arguments > from spark-submit and setting them with SparkConf to R diver's in-process > JVM through JNI? > > On Thu, Sep 10, 2015 at 9:29 PM, Shivaram Venkataraman > <shiva...@eecs.berkeley.edu> wrote: >> >> Yeah in addition to the downside of having 2 JVMs

Re: [VOTE] Release Apache Spark 1.5.0 (RC1)

2015-08-20 Thread Shivaram Venkataraman
FYI The staging repository published as version 1.5.0 is at https://repository.apache.org/content/repositories/orgapachespark-1136 while the staging repository published as version 1.5.0-rc1 is at https://repository.apache.org/content/repositories/orgapachespark-1137 Thanks Shivaram On Thu, Aug

Re: [ANNOUNCE] Nightly maven and package builds for Spark

2015-08-17 Thread Shivaram Venkataraman
-thoughts.com wrote: thx for this, let me know if you need help 2015-08-16 23:38 GMT+02:00 Shivaram Venkataraman shiva...@eecs.berkeley.edu: I just investigated this and this is happening because of a Maven version requirement not being met. I'll look at modifying the build scripts to use Maven

Re: [ANNOUNCE] Nightly maven and package builds for Spark

2015-08-16 Thread Shivaram Venkataraman
I just investigated this and this is happening because of a Maven version requirement not being met. I'll look at modifying the build scripts to use Maven 3.3.3 (with build/mvn --force ?) Shivaram On Sun, Aug 16, 2015 at 10:16 AM, Olivier Girardot o.girar...@lateral-thoughts.com wrote: Hi

Re: SparkR DataFrame fail to return data of Decimal type

2015-08-14 Thread Shivaram Venkataraman
Thanks for the catch. Could you send a PR with this diff ? On Fri, Aug 14, 2015 at 10:30 AM, Shkurenko, Alex ashkure...@enova.com wrote: Got an issue similar to https://issues.apache.org/jira/browse/SPARK-8897, but with the Decimal datatype coming from a Postgres DB: //Set up SparkR

Re: SparkR driver side JNI

2015-08-06 Thread Shivaram Venkataraman
The in-process JNI only works out when the R process comes up first and we launch a JVM inside it. In many deploy modes like YARN (or actually in anything using spark-submit) the JVM comes up first and we launch R after that. Using an inter-process solution helps us cover both use cases Thanks

Re: Why SparkR didn't reuse PythonRDD

2015-08-06 Thread Shivaram Venkataraman
PythonRDD.scala has a number of PySpark specific conventions (for example worker reuse, exceptions etc.) and PySpark specific protocols (e.g. for communicating accumulators, broadcasts between the JVM and Python etc.). While it might be possible to refactor the two classes to share some more code

Re: Should spark-ec2 get its own repo?

2015-08-03 Thread Shivaram Venkataraman
I sent a note to the Mesos developers and created https://github.com/apache/spark/pull/7899 to change the repository pointer. There are 3-4 open PRs right now in the mesos/spark-ec2 repository and I'll work on migrating them to amplab/spark-ec2 later today. My thoughts on moving the python script

Moving spark-ec2 to amplab github organization

2015-08-03 Thread Shivaram Venkataraman
Hi Mesos developers The Apache Spark project has been hosting using https://github.com/mesos/spark-ec2 as a supporting repository for some of our EC2 scripts. This is a remnant from the days when the Spark project itself was hosted at github.com/mesos/spark. Based on discussions in the Spark

Re: Should spark-ec2 get its own repo?

2015-07-31 Thread Shivaram Venkataraman
? it feels like something that would be good to do before 1.5.0, if it's going to happen soon. On Wed, Jul 22, 2015 at 6:59 AM, Shivaram Venkataraman shiva...@eecs.berkeley.edu wrote: Yeah I'll send a note to the mesos dev list just to make sure they are informed. Shivaram On Tue, Jul 21

Re: Should spark-ec2 get its own repo?

2015-07-22 Thread Shivaram Venkataraman
assume (atleast part of it) is owned by mesos project and so its PMC ? - Mridul On Tue, Jul 21, 2015 at 9:22 AM, Shivaram Venkataraman shiva...@eecs.berkeley.edu wrote: There is technically no PMC for the spark-ec2 project (I guess we are kind of establishing one right now). I haven't

Re: Should spark-ec2 get its own repo?

2015-07-21 Thread Shivaram Venkataraman
to prevent future issues with apache. Regards, Mridul On Mon, Jul 20, 2015 at 12:01 PM, Shivaram Venkataraman shiva...@eecs.berkeley.edu wrote: I've created https://github.com/amplab/spark-ec2 and added an initial set of committers. Note that this is not a fork of the existing github.com

Re: Should spark-ec2 get its own repo?

2015-07-21 Thread Shivaram Venkataraman
, Mridul Muralidharan mri...@gmail.com wrote: If I am not wrong, since the code was hosted within mesos project repo, I assume (atleast part of it) is owned by mesos project and so its PMC ? - Mridul On Tue, Jul 21, 2015 at 9:22 AM, Shivaram Venkataraman shiva...@eecs.berkeley.edu wrote

Re: Should spark-ec2 get its own repo?

2015-07-20 Thread Shivaram Venkataraman
be migrating some PRs / closing them in the old repo and will also update the README in that repo. Thanks Shivaram On Fri, Jul 17, 2015 at 3:00 PM, Sean Owen so...@cloudera.com wrote: On Fri, Jul 17, 2015 at 6:58 PM, Shivaram Venkataraman shiva...@eecs.berkeley.edu wrote: I am not sure why the ASF JIRA

Re: Model parallelism with RDD

2015-07-17 Thread Shivaram Venkataraman
+= (System.nanoTime() - t) / 1e9 oldRDD = newRDD i += 1 } println(Avg iteration time: + avgTime / numIterations) Best regards, Alexander *From:* Shivaram Venkataraman [mailto:shiva...@eecs.berkeley.edu shiva...@eecs.berkeley.edu] *Sent:* Friday, July 10, 2015 10:04 PM

Re: Should spark-ec2 get its own repo?

2015-07-17 Thread Shivaram Venkataraman
Some replies inline On Wed, Jul 15, 2015 at 1:08 AM, Sean Owen so...@cloudera.com wrote: The code can continue to be a good reference implementation, no matter where it lives. In fact, it can be a better more complete one, and easier to update. I agree that ec2/ needs to retain some kind of

Re: Spark Core and ways of talking to it for enhancing application language support

2015-07-14 Thread Shivaram Venkataraman
Both SparkR and the PySpark API call into the JVM Spark API (i.e. JavaSparkContext, JavaRDD etc.). They use different methods (Py4J vs. the R-Java bridge) to call into the JVM based on libraries available / features supported in each language. So for Haskell, one would need to see what is the best

Re: Should spark-ec2 get its own repo?

2015-07-13 Thread Shivaram Venkataraman
I think moving the repo-location and re-organizing the python code to handle dependencies, testing etc. sounds good to me. However, I think there are a couple of things which I am not sure about 1. I strongly believe that we should preserve existing command-line in ec2/spark-ec2 (i.e. the shell

Re: Model parallelism with RDD

2015-07-10 Thread Shivaram Venkataraman
I think you need to do `newRDD.cache()` and `newRDD.count` before you do oldRDD.unpersist(true) -- Otherwise it might be recomputing all the previous iterations each time. Thanks Shivaram On Fri, Jul 10, 2015 at 7:44 PM, Ulanov, Alexander alexander.ula...@hp.com wrote: Hi, I am interested

Re: Model parallelism with RDD

2015-07-10 Thread Shivaram Venkataraman
...@hp.com wrote: Hi Shivaram, Thank you for suggestion! If I do .cache and .count, each iteration take much more time, which is spent in GC. Is it normal? 10 июля 2015 г., в 21:23, Shivaram Venkataraman shiva...@eecs.berkeley.edumailto:shiva...@eecs.berkeley.edu написал(а): I think you

Re: PySpark vs R

2015-07-10 Thread Shivaram Venkataraman
The R and Python implementations differ in how they communicate with the JVM so there is no invariant there per-se. Thanks Shivaram On Thu, Jul 9, 2015 at 10:40 PM, Vasili I. Galchin vigalc...@gmail.com wrote: Hello, Just trying to get up to speed ( a week .. pls be patient with me).

  1   2   >