Hi Sujit,
I just wanted to access public datasets on Amazon. Do I still need to provide
the keys?
Thank you,
From: Sujit Pal [mailto:sujitatgt...@gmail.com]
Sent: Tuesday, July 14, 2015 3:14 PM
To: Pagliari, Roberto
Cc: user@spark.apache.org
Subject: Re: Spark on EMR with S3 example (Python
Is there an example about how to load data from a public S3 bucket in Python? I
haven't found any.
Thank you,
I'm following the tutorial about Apache Spark on EC2. The output is the
following:
$ ./spark-ec2 -i ../spark.pem -k spark --copy launch spark-training
Setting up security groups...
Searching for existing cluster spark-training...
Latest Spark AMI: ami-19474270
Launching
With the Python APIs, the available arguments I got (using inspect module) are
the following:
['cls', 'data', 'iterations', 'step', 'miniBatchFraction', 'initialWeights',
'regParam', 'regType', 'intercept']
numClasses is not available. Can someone comment on this?
Thanks,
Suppose I have something like the code below
for idx in xrange(0, 10):
train_test_split = training.randomSplit(weights=[0.75, 0.25])
train_cv = train_test_split[0]
test_cv = train_test_split[1]
# scale train_cv and test_cv
by scaling
the values and preserve the original ones.
Thank you,
From: Sven Krasser [mailto:kras...@gmail.com]
Sent: Friday, April 24, 2015 5:56 PM
To: Pagliari, Roberto
Cc: user@spark.apache.org
Subject: Re: indexing an RDD [Python]
The solution depends largely on your use case. I assume the index is in the
key
I have an RDD of LabledPoints.
Is it possible to select a subset of it based on a list of indeces?
For example with idx=[0,4,5,6,8], I'd like to be able to create a new RDD with
elements 0,4,5,6 and 8.
-
To unsubscribe,
Can anybody point me to an example, if available, about gridsearch with python?
Thank you,
I know grid search with cross validation is not supported. However, I was
wondering if there is something availalable for the time being.
Thanks,
From: Punyashloka Biswal [mailto:punya.bis...@gmail.com]
Sent: Thursday, April 23, 2015 9:06 PM
To: Pagliari, Roberto; user@spark.apache.org
Subject
Is there a way to set the cost value C when using linear SVM?
I'm executing this example from the documentation (in single node mode)
# sc is an existing SparkContext.
from pyspark.sql import HiveContext
sqlContext = HiveContext(sc)
sqlContext.sql(CREATE TABLE IF NOT EXISTS src (key INT, value STRING))
# Queries can be expressed in HiveQL.
results =
I'm getting this error when importing hive context
from pyspark.sql import HiveContext
Traceback (most recent call last):
File stdin, line 1, in module
File /path/spark-1.1.0/python/pyspark/__init__.py, line 63, in module
from pyspark.context import SparkContext
File
I'm running the latest version of spark with Hadoop 1.x and scala 2.9.3 and
hive 0.9.0.
When using python 2.7
from pyspark.sql import HiveContext
sqlContext = HiveContext(sc)
I'm getting 'sc not defined'
On the other hand, I can see 'sc' from pyspark CLI.
Is there a way to fix it?
I'm using this system
Hadoop 1.0.4
Scala 2.9.3
Hive 0.9.0
With spark 1.1.0. When importing pyspark, I'm getting this error:
from pyspark.sql import *
Traceback (most recent call last):
File stdin, line 1, in ?
File /path/spark-1.1.0/python/pyspark/__init__.py, line 63, in ?
from
I also didn’t realize I was trying to bring up the 2ndNameNode as a slave..
that might be an issue as well..
Thanks,
From: Yana Kadiyska [mailto:yana.kadiy...@gmail.com]
Sent: Thursday, October 30, 2014 11:27 AM
To: Pagliari, Roberto
Cc: user@spark.apache.org
Subject: Re: problem with start
, Pagliari, Roberto
rpagli...@appcomsci.commailto:rpagli...@appcomsci.com wrote:
I ran sbin/start-master.sh followed by sbin/start-slaves.sh (I build with PHive
option to be able to interface with hive)
I’m getting this
ip_address: org.apache.spark.deploy.worker.Worker running as process . Stop
Is there a repo or some kind of instruction about how to install sbt for centos?
Thanks,
I ran sbin/start-master.sh followed by sbin/start-slaves.sh (I build with PHive
option to be able to interface with hive)
I'm getting this
ip_address: org.apache.spark.deploy.worker.Worker running as process . Stop
it first.
Am I doing something wrong? In my specific case, shark+hive is
If I already have hive running on Hadoop, do I need to build Hive using
sbt/sbt -Phive assembly/assembly
command?
If the answer is no, how do I tell spark where hive home is?
Thanks,
I'm a newbie with Spark. After installing it on all the machines I want to use,
do I need to tell it about Hadoop configuration, or will it be able to find it
himself?
Thank you,
20 matches
Mail list logo