*Hi Spark Devs and Users,BerkeleyX and Databricks are currently developing
two Spark-related MOOC on edX (intro
https://www.edx.org/course/introduction-big-data-apache-spark-uc-berkeleyx-cs100-1x,
ml
https://www.edx.org/course/scalable-machine-learning-uc-berkeleyx-cs190-1x),
the first of which
**Learning the ropes**
I'm trying to grasp the concept of using the pipeline in pySpark...
Simplified example:
list=[(1,alpha),(1,beta),(1,foo),(1,alpha),(2,alpha),(2,alpha),(2,bar),(3,foo)]
Desired outcome:
[(1,3),(2,2),(3,1)]
Basically for each key, I want the number of unique values.
I've
Hi,
Let me reword your request so you understand how (too) generic your question
is
Hi, I have $10,000, please find me some means of transportation so I can get
to work.
Please provide (a lot) more details. If you can't, consider using one of the
pre-built express VMs from either
know if you need any further information and if you dont
know
please drive across with the $1 to Sir Paco Nathan and get me the
answer.
Thanks and Regards,
Sudipta
On Thu, Jan 22, 2015 at 5:33 PM, Marco Shaw marco.s...@gmail.com
wrote:
Hi,
Let me reword your request so
(Starting over...)
The best place to look for the requirements would be at the individual
pages of each technology.
As for absolute minimum requirements, I would suggest 50GB of disk space
and at least 8GB of memory. This is the absolute minimum.
Architecting a solution like you are looking
Pretty vague on details:
http://www.datasciencecentral.com/m/blogpost?id=6448529%3ABlogPost%3A227199
On Jan 9, 2015, at 11:39 AM, Jaonary Rabarisoa jaon...@gmail.com wrote:
Hi all,
DeepLearning algorithms are popular and achieve many state of the art
performance in several real world
When it is ready.
On Dec 16, 2014, at 11:43 PM, 张建轶 zhangjia...@youku.com wrote:
Hi £¡
when will the spark 1.3.0 be released£¿
I want to use new LDA feature.
Thank
you!B‹CB•È[œÝXœØÜšX™KK[XZ[ˆ\Ù\‹][œÝXœØÜšX™P
First thing... Go into the Cloudera Manager and make sure that the Spark
service (master?) is started.
Marco
On Thu, Jul 24, 2014 at 7:53 AM, Sameer Sayyed sam.sayyed...@gmail.com
wrote:
Hello All,
I am new user of spark, I am using *cloudera-quickstart-vm-5.0.0-0-vmware*
for execute
I'm a Spark and HDInsight novice, so I could be wrong...
HDInsight is based on HDP2, so my guess here is that you have the option of
installing/configuring Spark in cluster mode (YARN) or in standalone mode
and package the Spark binaries with your job.
Everything I seem to look at is related to
Looks like going with cluster mode is not a good idea:
http://azure.microsoft.com/en-us/documentation/articles/hdinsight-administer-use-management-portal/
Seems like a non-HDInsight VM might be needed to make it the Spark master
node.
Marco
On Mon, Jul 14, 2014 at 12:43 PM, Marco Shaw marco.s
Can you provide links to the sections that are confusing?
My understanding, the HDP1 binaries do not need YARN, while the HDP2 binaries
do.
Now, you can also install Hortonworks Spark RPM...
For production, in my opinion, RPMs are better for manageability.
On Jul 6, 2014, at 5:39 PM,
installation
needed.
And this is confusing for me... do I need rpm installation on not?...
Thank you,
Konstantin Kudryavtsev
On Sun, Jul 6, 2014 at 10:56 PM, Marco Shaw marco.s...@gmail.com wrote:
Can you provide links to the sections that are confusing?
My understanding, the HDP1
They are recorded... For example, 2013: http://spark-summit.org/2013
I'm assuming the 2014 videos will be up in 1-2 weeks.
Marco
On Tue, Jul 1, 2014 at 3:18 PM, Soumya Simanta soumya.sima...@gmail.com
wrote:
Are these sessions recorded ?
On Tue, Jul 1, 2014 at 9:47 AM, Alexis Roos
Dean: Some interesting information... Do you know where I can read more about
these coming changes to Scalding/Cascading?
On Jun 27, 2014, at 9:40 AM, Dean Wampler deanwamp...@gmail.com wrote:
... and to be clear on the point, Summingbird is not limited to MapReduce. It
abstracts over
Sorry. Never mind... I guess that's what Summingbird is all about. Never
heard of it.
On Jun 27, 2014, at 7:10 PM, Marco Shaw marco.s...@gmail.com wrote:
Dean: Some interesting information... Do you know where I can read more about
these coming changes to Scalding/Cascading?
On Jun 27
About run-example, I've tried MapR, Hortonworks and Cloudera distributions with
there Spark packages and none seem to package it.
Am I missing something? Is this only provided with the Spark project pre-built
binaries or from source installs?
Marco
On May 22, 2014, at 5:04 PM, Stephen
Hi,
I've wanted to play with Spark. I wanted to fast track things and just use
one of the vendor's express VMs. I've tried Cloudera CDH 5.0 and
Hortonworks HDP 2.1.
I've not written down all of my issues, but for certain, when I try to run
spark-shell it doesn't work. Cloudera seems to crash,
17 matches
Mail list logo