Self contained Spark application with local master without spark-submit

2022-01-19 Thread Colin Williams
Hello, I noticed I can run spark applications with a local master via sbt run and also via the IDE. I'd like to run a single threaded worker application as a self contained jar. What does sbt run employ that allows it to run a local master? Can I build an uber jar and run without spark-submit?

Re: Corrupt record handling in spark structured streaming and from_json function

2018-12-31 Thread Colin Williams
this, particularly without using multiple streams? On Wed, Dec 26, 2018 at 6:01 PM Colin Williams wrote: > > https://stackoverflow.com/questions/53938967/writing-corrupt-data-from-kafka-json-datasource-in-spark-structured-streaming > > On Wed, Dec 26, 2018 at 2:42 PM Colin Willi

Re: Corrupt record handling in spark structured streaming and from_json function

2018-12-26 Thread Colin Williams
https://stackoverflow.com/questions/53938967/writing-corrupt-data-from-kafka-json-datasource-in-spark-structured-streaming On Wed, Dec 26, 2018 at 2:42 PM Colin Williams wrote: > > From my initial impression it looks like I'd need to create my own > `from_json` using `json

Re: Corrupt record handling in spark structured streaming and from_json function

2018-12-26 Thread Colin Williams
>From my initial impression it looks like I'd need to create my own `from_json` using `jsonToStructs` as a reference but try to handle ` case : BadRecordException => null ` or similar to try to write the non matching string to a corrupt records column On Wed, Dec 26, 2018 at 1:55 PM

Corrupt record handling in spark structured streaming and from_json function

2018-12-26 Thread Colin Williams
Hi, I'm trying to figure out how I can write records that don't match a json read schema via spark structred streaming to an output sink / parquet location. Previously I did this in batch via corrupt column features of batch. But in this spark structured streaming I'm reading from kafka a string

Re: Packaging kafka certificates in uber jar

2018-12-26 Thread Colin Williams
g-context > > Best, > Anastasios > > On Mon, Dec 24, 2018 at 10:29 PM Colin Williams > wrote: >> >> I've been trying to read from kafka via a spark streaming client. I >> found out spark cluster doesn't have certificates deployed. Then I >> tried using the

Packaging kafka certificates in uber jar

2018-12-24 Thread Colin Williams
I've been trying to read from kafka via a spark streaming client. I found out spark cluster doesn't have certificates deployed. Then I tried using the same local certificates I've been testing with by packing them in an uber jar and getting a File handle from the Classloader resource. But I'm

Re: Casting nested columns and updated nested struct fields.

2018-11-23 Thread Colin Williams
Looks like it's been reported already. It's too bad it's been a year but should be released into spark 3: https://issues.apache.org/jira/browse/SPARK-22231 On Fri, Nov 23, 2018 at 8:42 AM Colin Williams wrote: > > Seems like it's worthy of filing a bug against withColumn > > On Wed,

Re: Casting nested columns and updated nested struct fields.

2018-11-23 Thread Colin Williams
Seems like it's worthy of filing a bug against withColumn On Wed, Nov 21, 2018, 6:25 PM Colin Williams < colin.williams.seat...@gmail.com wrote: > Hello, > > I'm currently trying to update the schema for a dataframe with nested > columns. I would either like to update the schema

Casting nested columns and updated nested struct fields.

2018-11-21 Thread Colin Williams
Hello, I'm currently trying to update the schema for a dataframe with nested columns. I would either like to update the schema itself or cast the column without having to explicitly select all the columns just to cast one. In regards to updating the schema it looks like I would probably need to

inferred schemas for spark streaming from a Kafka source

2018-11-13 Thread Colin Williams
Does anybody know how to use inferred schemas with structured streaming: https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#schema-inference-and-partition-of-streaming-dataframesdatasets I have some code like : object StreamingApp { def launch(config: Config,

Dataframe reader does not read microseconds, but TimestampType supports microseconds

2018-07-02 Thread Colin Williams
I'm confused as to why Sparks Dataframe reader does not support reading json or similar with microsecond timestamps to microseconds, but instead reads into millis. This seems strange when the TimestampType supports microseconds. For example create a schema for a json object with a column of

Specifying a custom Partitioner on RDD creation in Spark 2

2018-04-10 Thread Colin Williams
ps://stackoverflow.com/a/25204589 but it's from an older version of Spark. I'm hoping maybe there is something more recent and more in-depth. I don't mind references to books or otherwise. Best, Colin Williams - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: spark-sql importing schemas from catalogString or schema.toString()

2018-03-28 Thread Colin Williams
t;) .load("src/test/resources/*.gz") df1.show(80) On Wed, Mar 28, 2018 at 5:10 PM, Colin Williams <colin.williams.seat...@gmail.com> wrote: > I've had more success exporting the schema toJson and importing that. > Something like: > > > val df1: DataFrame = session.r

Re: spark-sql importing schemas from catalogString or schema.toString()

2018-03-28 Thread Colin Williams
t/resources/*.gz") df1.show(80) On Wed, Mar 28, 2018 at 3:25 PM, Colin Williams <colin.williams.seat...@gmail.com> wrote: > The to String representation look like where "someName" is unique: > > StructType(StructField("someName",StringType,true), > Str

Re: spark-sql importing schemas from catalogString or schema.toString()

2018-03-28 Thread Colin Williams
AME:struct<newValue:string,SOME_TABLE_NAME:string>, SOME_TABLE_NAME:struct<newValue:string,SOME_TABLE_NAME:string>,SOME_TABLE_NAME:struct<newValue:string, SOME_TABLE_NAME:string>,SOME_TABLE_NAME:struct<newValue:string,SOME_TABLE_NAME:string>,SOME_TABLE_NAME: struct<newValue:s

spark-sql importing schemas from catalogString or schema.toString()

2018-03-28 Thread Colin Williams
I've been learning spark-sql and have been trying to export and import some of the generated schemas to edit them. I've been writing the schemas to strings like df1.schema.toString() and df.schema.catalogString But I've been having trouble loading the schemas created. Does anyone know if it's