a transformation and thus is not
> actually applied until some action (like 'foreach') is called on the
> resulting RDD.
> You can find more information in the Spark Programming Guide
> http://spark.apache.org/docs/latest/programming-guide.html#rdd-operations.
>
> best,
> --Ja
I'm currently developing a Spark Streaming application.
I have a function that receives an RDD and an object instance as a
parameter, and returns an RDD:
def doTheThing(a: RDD[A], b: B): RDD[C]
Within the function, I do some processing within a map of the RDD.
Like this:
def doTheThing(a:
I have a rather odd use case. I have a DataFrame column name with a + value
in it.
The app performs some processing steps before determining the column name,
and it
would be much easier to code if I could use the DataFrame filter operations
with a String.
This demonstrates the issue I am having:
The spark-ec2 script generates spark config files from templates. Those are
located here:
https://github.com/amplab/spark-ec2/tree/branch-1.5/templates/root/spark/conf
Note the link is referring to the 1.5 branch.
Is this what you are looking for?
Jeff
On Mon, Oct 5, 2015 at 8:56 AM, Renato
escape weird characters in column names.
>
> On Mon, Oct 5, 2015 at 12:59 AM, Hemminger Jeff <j...@atware.co.jp> wrote:
>
>> I have a rather odd use case. I have a DataFrame column name with a +
>> value in it.
>> The app performs some processing steps before determ
I am trying to understand the process of caching and specifically what the
behavior is when the cache is full. Please excuse me if this question is a
little vague, I am trying to build my understanding of this process.
I have an RDD that I perform several computations with, I persist it with
need to create the
connection within a mapPartitions code block to avoid the connection
setup/teardown overhead)?
I haven't done this myself though, so I'm just throwing the idea out
there.
On Fri, Aug 28, 2015 at 3:39 AM Hemminger Jeff j...@atware.co.jp
wrote:
Hi,
I am working
Hi,
I am working on a Spark application that is using of a large (~3G)
broadcast variable as a lookup table. The application refines the data in
this lookup table in an iterative manner. So this large variable is
broadcast many times during the lifetime of the application process.
From what I