I haven't worked with datasets but would this help
https://stackoverflow.com/questions/37513667/how-to-create-a-spark-dataset-from-an-rdd
?
On Jun 23, 2017 5:43 PM, "Keith Chapman" wrote:
> Hi,
>
> I have code that does the following using RDDs,
>
> val
helpful,
> hopefully someone else will be able to explain exactly how this works.
>
--
Saliya Ekanayake, Ph.D
Applied Computer Scientist
Network Dynamics and Simulation Science Laboratory (NDSSL)
Virginia Tech, Blacksburg
.
>
>
> Yong
>
>
> --
> *From:* Saliya Ekanayake <esal...@gmail.com>
> *Sent:* Wednesday, January 18, 2017 12:33 PM
> *To:* spline_pal...@yahoo.com
> *Cc:* jasbir.s...@accenture.com; User
> *Subject:* Re: Spark #cores
>
> The
gt; Sent from Yahoo Mail on Android
> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>
> On Wed, 18 Jan, 2017 at 10:16 pm, Saliya Ekanayake
> <esal...@gmail.com> wrote:
> Thank you, for the quick response. No, this is not Spark SQL. I am running
> the built-in PageRan
or cores 1
> and a default parallelism of 32 over 8 physical nodes.
>
>
>
> The web UI shows it's running on 200 cores. I can't relate this number to
> the parameters I've used. How can I control the parallelism in a more
> deterministic way?
>
>
>
> Thank you,
>
> Saliya
?
Thank you,
Saliya
--
Saliya Ekanayake, Ph.D
Applied Computer Scientist
Network Dynamics and Simulation Science Laboratory (NDSSL)
Virginia Tech, Blacksburg
Just realized the attached file has text formatting wrong. The github link
to the file is
https://github.com/esaliya/graphxprimer/blob/master/src/main/scala-2.10/org/saliya/graphxprimer/PregelExample2.scala
On Tue, Nov 22, 2016 at 3:08 PM, Saliya Ekanayake <esal...@gmail.com> wrote:
Spark would send the same array that it got
after the initial call.
Is there a way to turn off this caching effect?
Thank you,
Saliya
--
Saliya Ekanayake, Ph.D
Applied Computer Scientist
Network Dynamics and Simulation Science Laboratory (NDSSL)
Virginia Tech, Blacksburg
PregelExample2.rtf
Hi,
I have created a property graph using GraphX. Each vertex has an integer
array as a property. I'd like to update the values of theses arrays without
creating new graph objects.
Is this possible in Spark?
Thank you,
Saliya
--
Saliya Ekanayake, Ph.D
Applied Computer Scientist
Network
educe shuffling by following similar partitioning on
> both RDDs
>
> On Wed, Sep 14, 2016 at 2:00 PM, Saliya Ekanayake <esal...@gmail.com>
> wrote:
>
>> Thank you, but isn't that join going to be too expensive for this?
>>
>> On Tue, Sep 13, 2016 at 11:5
be of
> signature (filename,filecontent).
> 3. Join RDD1 and 2 based on some file name (or some other key).
>
> On Wed, Sep 14, 2016 at 1:41 PM, Saliya Ekanayake <esal...@gmail.com>
> wrote:
>
>> 1.) What needs to be parallelized is the work for each of those
e?
> 2. Your first text file has 6M rows, but total number of files~80K. is
> there a scenario where there may not be a file in HDFS corresponding to the
> row in first text file?
> 3. May be a follow up of 1, what is your end goal?
>
> On Wed, Sep 14, 2016 at 12:17 PM, Saliya E
13 Sep 2016 11:39 p.m., "Saliya Ekanayake" <esal...@gmail.com> wrote:
>
>> Just wonder if this is possible with Spark?
>>
>> On Mon, Sep 12, 2016 at 12:14 AM, Saliya Ekanayake <esal...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>
Just wonder if this is possible with Spark?
On Mon, Sep 12, 2016 at 12:14 AM, Saliya Ekanayake <esal...@gmail.com>
wrote:
> Hi,
>
> I've got a text file where each line is a record. For each record, I need
> to process a file in HDFS.
>
> So if I represent these reco
() or is there a better solution to that?
Thank you,
Saliya
--
Saliya Ekanayake
Ph.D. Candidate | Research Assistant
School of Informatics and Computing | Digital Science Center
Indiana University, Bloomington
15 matches
Mail list logo