Hi
I am getting the following error when persisting an RDD in parquet format to
an S3 location. This is code that was working in the 1.2 version. The
version that it is failing to work is 1.3.1.
Any help is appreciated.
Caused by: java.lang.AssertionError: assertion failed: Conflicting
Hi
Could Spark-SQL be used from within a custom actor that acts as a receiver
for a streaming application? If yes, what is the recommended way of passing
the SparkContext to the actor?
Thanks for your help.
- Ranga
--
View this message in context:
Just to close out this one, I noticed that the cache partition size was quite
low for each of the RDDs (1 - 14). Increasing the number of partitions
(~400) resolved this for me.
--
View this message in context:
Hi
I am noticing that the RDDs that are persisted get cleaned up very quickly.
This usually happens in a matter of a few minutes. I tried setting a value
of 20 hours for the /spark.cleaner.ttl/ property and still get the same
behavior.
In my use-case, I have to persist about 20 RDDs each of size
Hi
I am noticing that the RDDs that are persisted get cleaned up very quickly.
This usually happens in a matter of a few minutes. I tried setting a value
of 20 hours for the /spark.cleaner.ttl/ property and still get the same
behavior.
In my use-case, I have to persist about 20 RDDs each of
Increasing the driver memory resolved this issue. Thanks to Nick for the
hint. Here is how I am starting the shell: spark-shell --driver-memory 4g
--driver-cores 4 --master local
--
View this message in context:
Hi
I am new to Spark and trying to develop an application that loads data from
Hive. Here is my setup:
* Spark-1.1.0 (built using -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0
-Phive)
* Executing Spark-shell on a box with 16 GB RAM
* 4 Cores Single Processor
* OpenCSV library (SerDe)
* Hive table