es = sc.paralleize()
> tables.map(getTable).map(saveTable)
>
> On Wed, Mar 22, 2017 at 9:41 AM, Shashank Mandil <
> mandil.shash...@gmail.com> wrote:
>
> I am using spark to dump data from mysql into hdfs.
> The way I am doing this is by creating a spark dataframe with the
here must be a better way to solve it
>
> On Wed, Mar 22, 2017 at 9:34 AM, Shashank Mandil <
> mandil.shash...@gmail.com> wrote:
>
>> Hi All,
>>
>> I am using spark in a yarn cluster mode.
>> When I run a yarn application it creates multiple
Hi All,
I have a spark data frame which has 992 rows inside it.
When I run a map on this data frame I expect that the map should work for
all the 992 rows.
As a mapper runs on an executor on a cluster I did a distributed count of
the number of rows the mapper is being run on.
dataframe.map(r
am,
> > Jacek Laskowski
> >
> > https://medium.com/@jaceklaskowski/
> > Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
> > Follow me at https://twitter.com/jaceklaskowski
> >
> >
> > On Fri, Feb 3, 2017 at 8:06 PM, S
Hi All,
I wrote a test script which always throws an exception as below :
object Test {
def main(args: Array[String]) {
try {
val conf =
new SparkConf()
.setAppName("Test")
throw new RuntimeException("Some Exception")
println("all done!")
}
Hi Aakash,
I think what it generally means that you have to use the general spark APIs
of Dataframe to bring in the data and crunch the numbers, however you
cannot use the KMeansClustering algorithm which is already present in the
MLlib spark library.
I think a good place to start would be