Re: Is there an api in Dataset/Dataframe that does repartitionAndSortWithinPartitions?

2017-06-24 Thread Saliya Ekanayake
I haven't worked with datasets but would this help https://stackoverflow.com/questions/37513667/how-to-create-a-spark-dataset-from-an-rdd ? On Jun 23, 2017 5:43 PM, "Keith Chapman" wrote: > Hi, > > I have code that does the following using RDDs, > > val

Re: Spark #cores

2017-01-18 Thread Saliya Ekanayake
helpful, > hopefully someone else will be able to explain exactly how this works. > -- Saliya Ekanayake, Ph.D Applied Computer Scientist Network Dynamics and Simulation Science Laboratory (NDSSL) Virginia Tech, Blacksburg

Re: Spark #cores

2017-01-18 Thread Saliya Ekanayake
. > > > Yong > > > -- > *From:* Saliya Ekanayake <esal...@gmail.com> > *Sent:* Wednesday, January 18, 2017 12:33 PM > *To:* spline_pal...@yahoo.com > *Cc:* jasbir.s...@accenture.com; User > *Subject:* Re: Spark #cores > > The

Re: Spark #cores

2017-01-18 Thread Saliya Ekanayake
gt; Sent from Yahoo Mail on Android > <https://overview.mail.yahoo.com/mobile/?.src=Android> > > On Wed, 18 Jan, 2017 at 10:16 pm, Saliya Ekanayake > <esal...@gmail.com> wrote: > Thank you, for the quick response. No, this is not Spark SQL. I am running > the built-in PageRan

Re: Spark #cores

2017-01-18 Thread Saliya Ekanayake
or cores 1 > and a default parallelism of 32 over 8 physical nodes. > > > > The web UI shows it's running on 200 cores. I can't relate this number to > the parameters I've used. How can I control the parallelism in a more > deterministic way? > > > > Thank you, > > Saliya

Spark #cores

2017-01-18 Thread Saliya Ekanayake
? Thank you, Saliya -- Saliya Ekanayake, Ph.D Applied Computer Scientist Network Dynamics and Simulation Science Laboratory (NDSSL) Virginia Tech, Blacksburg

Re: Pregel Question

2016-11-22 Thread Saliya Ekanayake
Just realized the attached file has text formatting wrong. The github link to the file is https://github.com/esaliya/graphxprimer/blob/master/src/main/scala-2.10/org/saliya/graphxprimer/PregelExample2.scala On Tue, Nov 22, 2016 at 3:08 PM, Saliya Ekanayake <esal...@gmail.com> wrote:

Pregel Question

2016-11-22 Thread Saliya Ekanayake
Spark would send the same array that it got after the initial call. Is there a way to turn off this caching effect? Thank you, Saliya -- Saliya Ekanayake, Ph.D Applied Computer Scientist Network Dynamics and Simulation Science Laboratory (NDSSL) Virginia Tech, Blacksburg PregelExample2.rtf

GraphX updating vertex property

2016-11-15 Thread Saliya Ekanayake
Hi, I have created a property graph using GraphX. Each vertex has an integer array as a property. I'd like to update the values of theses arrays without creating new graph objects. Is this possible in Spark? Thank you, Saliya -- Saliya Ekanayake, Ph.D Applied Computer Scientist Network

Re: Access HDFS within Spark Map Operation

2016-09-13 Thread Saliya Ekanayake
educe shuffling by following similar partitioning on > both RDDs > > On Wed, Sep 14, 2016 at 2:00 PM, Saliya Ekanayake <esal...@gmail.com> > wrote: > >> Thank you, but isn't that join going to be too expensive for this? >> >> On Tue, Sep 13, 2016 at 11:5

Re: Access HDFS within Spark Map Operation

2016-09-13 Thread Saliya Ekanayake
be of > signature (filename,filecontent). > 3. Join RDD1 and 2 based on some file name (or some other key). > > On Wed, Sep 14, 2016 at 1:41 PM, Saliya Ekanayake <esal...@gmail.com> > wrote: > >> 1.) What needs to be parallelized is the work for each of those

Re: Access HDFS within Spark Map Operation

2016-09-13 Thread Saliya Ekanayake
e? > 2. Your first text file has 6M rows, but total number of files~80K. is > there a scenario where there may not be a file in HDFS corresponding to the > row in first text file? > 3. May be a follow up of 1, what is your end goal? > > On Wed, Sep 14, 2016 at 12:17 PM, Saliya E

Re: Access HDFS within Spark Map Operation

2016-09-13 Thread Saliya Ekanayake
13 Sep 2016 11:39 p.m., "Saliya Ekanayake" <esal...@gmail.com> wrote: > >> Just wonder if this is possible with Spark? >> >> On Mon, Sep 12, 2016 at 12:14 AM, Saliya Ekanayake <esal...@gmail.com> >> wrote: >> >>> Hi, >>> >

Re: Access HDFS within Spark Map Operation

2016-09-13 Thread Saliya Ekanayake
Just wonder if this is possible with Spark? On Mon, Sep 12, 2016 at 12:14 AM, Saliya Ekanayake <esal...@gmail.com> wrote: > Hi, > > I've got a text file where each line is a record. For each record, I need > to process a file in HDFS. > > So if I represent these reco

Access HDFS within Spark Map Operation

2016-09-11 Thread Saliya Ekanayake
() or is there a better solution to that? Thank you, Saliya -- Saliya Ekanayake Ph.D. Candidate | Research Assistant School of Informatics and Computing | Digital Science Center Indiana University, Bloomington