Hi Yuming - I was running into the same issue with larger worker nodes a few
weeks ago.
The way I managed to get around the high GC time, as per the suggestion of
some others, was to break each worker node up into individual workers of
around 10G in size. Divide your cores accordingly.
The other
Could you be more specific in how this is done?
A DataFrame class doesn't have that method.
On Sun, May 3, 2015 at 11:07 PM, ayan guha guha.a...@gmail.com wrote:
You can use custom partitioner to redistribution using partitionby
On 4 May 2015 15:37, Nick Travers n.e.trav...@gmail.com wrote
I'm currently trying to join two large tables (order 1B rows each) using
Spark SQL (1.3.0) and am running into long GC pauses which bring the job to
a halt.
I'm reading in both tables using a HiveContext with the underlying files
stored as Parquet Files. I'm using something along the lines of
what
you are writing since it is not BytesWritable / Text.
On Thu, Apr 2, 2015 at 3:40 AM, Nick Travers n.e.trav...@gmail.com
wrote:
I'm actually running this in a separate environment to our HDFS cluster.
I think I've been able to sort out the issue by copying
/opt/cloudera/parcels/CDH/lib
the executor) which gives the
java.lang.UnsatisfiedLinkError to see whether the libsnappy.so is in the
hadoop native lib path.
On Thursday, April 2, 2015 at 10:22 AM, Nick Travers wrote:
Thanks for the super quick response!
I can read the file just fine in hadoop, it's just when I point Spark
Has anyone else encountered the following error when trying to read a snappy
compressed sequence file from HDFS?
*java.lang.UnsatisfiedLinkError:
org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z*
The following works for me when the file is uncompressed:
import
PM, Xianjin YE advance...@gmail.com wrote:
Can you read snappy compressed file in hdfs? Looks like the libsnappy.so
is not in the hadoop native lib path.
On Thursday, April 2, 2015 at 10:13 AM, Nick Travers wrote:
Has anyone else encountered the following error when trying to read a
snappy
Hi List,
I'm following this example here
https://github.com/databricks/learning-spark/tree/master/mini-complete-example
with the following:
$SPARK_HOME/bin/spark-submit \
--deploy-mode cluster \
--master spark://host.domain.ex:7077 \
--class