Could you please provide the jstack output? That would help the devs
identify the blocking operation more easily.
On Thu, Oct 29, 2015 at 6:54 PM, 陈宇航 wrote:
> I tried to use SparkLauncher (org.apache.spark.launcher.SparkLauncher) to
> submit a Spark Streaming job,
Actually, Hadoop InputFormats can still be used to read and write from
file://, s3n://, and similar schemes. You just won't be able to
read/write to HDFS without installing Hadoop and setting up an HDFS cluster.
To summarize: Sourav, you can use any of the prebuilt packages (i.e.
anything other
(ReflectionUtils.java:106)
... 83 more
On Mon, Jun 29, 2015 at 10:02 AM, Jey Kottalam j...@cs.berkeley.edu
wrote:
Actually, Hadoop InputFormats can still be used to read and write from
file://, s3n://, and similar schemes. You just won't be able to
read/write to HDFS without installing Hadoop
-csv_2.
11:1.1.0 or com.databricks.spark.csv_2.11.1.1.0 I get class not found
error. With com.databricks.spark.csv I don't get the class not found error
but I still get the previous error even after using file:/// in the URI.
Regards,
Sourav
On Mon, Jun 29, 2015 at 1:13 PM, Jey Kottalam j
(ReflectionUtils.java:106)
... 83 more
Regards,
Sourav
On Mon, Jun 29, 2015 at 6:53 PM, Jey Kottalam j...@cs.berkeley.edu wrote:
The format is still com.databricks.spark.csv, but the parameter passed
to spark-shell is --packages com.databricks:spark-csv_2.11:1.1.0.
On Mon, Jun 29, 2015 at 2:59 PM
Hi Sourabh, could you try it with the stable 2.4 version of IPython?
On Thu, Feb 26, 2015 at 8:54 PM, sourabhguha sourabh.g...@hotmail.com wrote:
http://apache-spark-user-list.1001560.n3.nabble.com/file/n21843/pyspark_error.jpg
I get the above error when I try to run pyspark with the ipython
Hi Sathish,
The current implementation of countByKey uses reduceByKey:
https://github.com/apache/spark/blob/v1.2.1/core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala#L332
It seems that countByKey is mostly deprecated:
https://issues.apache.org/jira/browse/SPARK-3994
-Jey
On Tue,
Hi Aris,
A simple approach to gaining some of the benefits of an RBF kernel is
to add synthetic features to your training set. For example, if your
original data consists of 3-dimensional vectors [x, y, z], you could
compute a new 9-dimensional feature vector containing [x, y, z, x^2,
y^2, z^2,
I think you have to explicitly list the ephemeral disks in the device
map when launching the EC2 instance.
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/block-device-mapping-concepts.html
On Tue, Aug 19, 2014 at 11:54 AM, Andras Barjak
andras.bar...@lynxanalytics.com wrote:
Hi,
Using the
Hi Ben,
Has the PYSPARK_PYTHON environment variable been set in
spark/conf/spark-env.sh to the path of the new python binary?
FYI, there's a /root/copy-dirs script that can be handy when updating
files on an already-running cluster. You'll want to restart the spark
cluster for the changes to
Hi Abhishek,
Where mapreduce is taking 2 mins, spark is taking 5 min to complete the
job.
Interesting. Could you tell us more about your program? A code skeleton
would certainly be helpful.
Thanks!
-Jey
On Tue, Jun 17, 2014 at 3:21 PM, abhiguruvayya sharath.abhis...@gmail.com
wrote:
I did
Hi Rahul,
Marcelo's explanation is correct. Here's a possible approach to your
program, in pseudo-Python:
# connect to Spark cluster
sc = SparkContext(...)
# load input data
input_data = load_xls(file(input.xls))
input_rows = input_data['Sheet1'].rows
# create RDD on cluster
input_rdd =
Sorry, but I don't know where Cloudera puts the executor log files.
Maybe their docs give the correct path?
On Fri, Apr 25, 2014 at 12:32 PM, Joe L selme...@yahoo.com wrote:
hi thank you for your reply but I could not find it. it says that no such
file or directory
13 matches
Mail list logo