from:"Christopher Piggott"

Free Column Reference with $

2018-05-04 Thread Christopher Piggott

How does $"something" actually work (from a scala perspective) as a free column reference?

Running apps over a VPN

2018-05-02 Thread Christopher Piggott

My setup is that I have a spark master (using the spark scheduler) and 32 workers registered with it but they are on a private network. I can connect to that private network via OpenVPN. I would like to be able to run spark applications from a local (on my desktop) IntelliJ but have them use the

Re: Stream writing parquet files

2018-04-19 Thread Christopher Piggott

As a follow-up question, what happened to org.apache.spark.sql.parquet.RowWriteSupport ? It seems like it would help me. On Thu, Apr 19, 2018 at 9:23 PM, Christopher Piggott <cpigg...@gmail.com> wrote: > I am trying to write some parquet files and running out of memory. I'm > givin

Stream writing parquet files

2018-04-19 Thread Christopher Piggott

I am trying to write some parquet files and running out of memory. I'm giving my workers each 16GB and the data is 102 columns * 65536 rows - not really all that much. The content of each row is a short string. I am trying to create the file by dynamically allocating a StructType of StructField

Custom metrics sink

2018-03-16 Thread Christopher Piggott

Just for fun, i want to make a stupid program that makes different frequency chimes as each worker becomes active. That way you can 'hear' what the cluster is doing and how it's distributing work. I thought to do this I would make a custom Sink, but the Sink and everything else in

Spark MakeRDD preferred workers

2018-01-08 Thread Christopher Piggott

Hi, def makeRDD[T](seq: Seq[(T, Seq[String])])(implicit arg0: ClassTag[T]): RDD[T] list of tuples of data and location preferences (hostnames of Spark nodes) Is that list a list of acceptable choices, and it will choose one of them? Or is it an ordered list? I'm trying to ascertain how

binaryFiles() on directory full of directories

2018-01-08 Thread Christopher Piggott

I have a top level directory in HDFS that contains nothing but subdirectories (no actual files). In each one of those subdirs are a combination of files and other subdirs /topdir/dir1/(lots of files) /topdir/dir2/(lots of files) /topdir/dir2//subdir/(lots of files) I

Converting binary files

2017-12-30 Thread Christopher Piggott

I have been searching for examples, but not finding exactly what I need. I am looking for the paradigm for using spark 2.2 to convert a bunch of binary files into a bunch of different binary files. I'm starting with: val files = spark.sparkContext.binaryFiles("hdfs://1.2.3.4/input") then

Spark 2.2.1 worker invocation

2017-12-26 Thread Christopher Piggott

I need to set java.library.path to get access to some native code. Following directions, I made a spark-env.sh: #!/usr/bin/env bash export LD_LIBRARY_PATH="/usr/local/lib/libcdfNativeLibrary.so:/usr/local/lib/libcdf.so:${LD_LIBRARY_PATH}" export

NASA CDF files in Spark

2017-12-15 Thread Christopher Piggott

I'm looking to run a job that involves a zillion files in a format called CDF, a nasa standard. There are a number of libraries out there that can read CDFs but most of them are not high quality compared to the official NASA one, which has java bindings (via JNI). It's a little clumsy but I have

Free Column Reference with $

Running apps over a VPN

Re: Stream writing parquet files

Stream writing parquet files

Custom metrics sink

Spark MakeRDD preferred workers

binaryFiles() on directory full of directories

Converting binary files

Spark 2.2.1 worker invocation

NASA CDF files in Spark

10 matches

Site Navigation

Mail list logo

Footer information