How does $"something" actually work (from a scala perspective) as a free
column reference?
My setup is that I have a spark master (using the spark scheduler) and 32
workers registered with it but they are on a private network. I can
connect to that private network via OpenVPN.
I would like to be able to run spark applications from a local (on my
desktop) IntelliJ but have them use the
As a follow-up question, what happened
to org.apache.spark.sql.parquet.RowWriteSupport ? It seems like it would
help me.
On Thu, Apr 19, 2018 at 9:23 PM, Christopher Piggott <cpigg...@gmail.com>
wrote:
> I am trying to write some parquet files and running out of memory. I'm
> givin
I am trying to write some parquet files and running out of memory. I'm
giving my workers each 16GB and the data is 102 columns * 65536 rows - not
really all that much. The content of each row is a short string.
I am trying to create the file by dynamically allocating a StructType of
StructField
Just for fun, i want to make a stupid program that makes different
frequency chimes as each worker becomes active. That way you can 'hear'
what the cluster is doing and how it's distributing work.
I thought to do this I would make a custom Sink, but the Sink and
everything else in
Hi,
def makeRDD[T](seq: Seq[(T, Seq[String])])(implicit arg0: ClassTag[T]):
RDD[T]
list of tuples of data and location preferences (hostnames of Spark
nodes)
Is that list a list of acceptable choices, and it will choose one of them?
Or is it an ordered list? I'm trying to ascertain how
I have a top level directory in HDFS that contains nothing but
subdirectories (no actual files). In each one of those subdirs are a
combination of files and other subdirs
/topdir/dir1/(lots of files)
/topdir/dir2/(lots of files)
/topdir/dir2//subdir/(lots of files)
I
I have been searching for examples, but not finding exactly what I need.
I am looking for the paradigm for using spark 2.2 to convert a bunch of
binary files into a bunch of different binary files. I'm starting with:
val files = spark.sparkContext.binaryFiles("hdfs://1.2.3.4/input")
then
I need to set java.library.path to get access to some native code.
Following directions, I made a spark-env.sh:
#!/usr/bin/env bash
export
LD_LIBRARY_PATH="/usr/local/lib/libcdfNativeLibrary.so:/usr/local/lib/libcdf.so:${LD_LIBRARY_PATH}"
export
I'm looking to run a job that involves a zillion files in a format called
CDF, a nasa standard. There are a number of libraries out there that can
read CDFs but most of them are not high quality compared to the official
NASA one, which has java bindings (via JNI). It's a little clumsy but I
have
10 matches
Mail list logo