Self contained Spark application with local master without spark-submit
Hello, I noticed I can run spark applications with a local master via sbt run and also via the IDE. I'd like to run a single threaded worker application as a self contained jar. What does sbt run employ that allows it to run a local master? Can I build an uber jar and run without spark-submit? - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Corrupt record handling in spark structured streaming and from_json function
Dear spark user community, I have recieved some insight regarding filtering seperate dataframes in my spark-structured-streaming job. However I wish to write the dataframes aforementioned above in the stack overflow question each using a parquet writer to a separate location. My initial impression is this requires multiple sinks, but I'm being pressured against that. I think it might also be possible using the for each / for each batch writers. But I'm not sure regarding parquet writer, and also the caveats to this approach. Can some more advanced users or developers suggest how to go about this, particularly without using multiple streams? On Wed, Dec 26, 2018 at 6:01 PM Colin Williams wrote: > > https://stackoverflow.com/questions/53938967/writing-corrupt-data-from-kafka-json-datasource-in-spark-structured-streaming > > On Wed, Dec 26, 2018 at 2:42 PM Colin Williams > wrote: > > > > From my initial impression it looks like I'd need to create my own > > `from_json` using `jsonToStructs` as a reference but try to handle ` > > case : BadRecordException => null ` or similar to try to write the non > > matching string to a corrupt records column > > > > On Wed, Dec 26, 2018 at 1:55 PM Colin Williams > > wrote: > > > > > > Hi, > > > > > > I'm trying to figure out how I can write records that don't match a > > > json read schema via spark structred streaming to an output sink / > > > parquet location. Previously I did this in batch via corrupt column > > > features of batch. But in this spark structured streaming I'm reading > > > from kafka a string and using from_json on the value of that string. > > > If it doesn't match my schema then I from_json returns null for all > > > the rows, and does not populate a corrupt record column. But I want to > > > somehow obtain the source kafka string in a dataframe, and an write to > > > a output sink / parquet location. > > > > > > def getKafkaEventDataFrame(rawKafkaDataFrame: DataFrame, schema: > > > StructType) = { > > > val jsonDataFrame = > > > rawKafkaDataFrame.select(col("value").cast("string")) > > > jsonDataFrame.select(from_json(col("value"), > > > schema)).select("jsontostructs(value).*") > > > } - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Corrupt record handling in spark structured streaming and from_json function
https://stackoverflow.com/questions/53938967/writing-corrupt-data-from-kafka-json-datasource-in-spark-structured-streaming On Wed, Dec 26, 2018 at 2:42 PM Colin Williams wrote: > > From my initial impression it looks like I'd need to create my own > `from_json` using `jsonToStructs` as a reference but try to handle ` > case : BadRecordException => null ` or similar to try to write the non > matching string to a corrupt records column > > On Wed, Dec 26, 2018 at 1:55 PM Colin Williams > wrote: > > > > Hi, > > > > I'm trying to figure out how I can write records that don't match a > > json read schema via spark structred streaming to an output sink / > > parquet location. Previously I did this in batch via corrupt column > > features of batch. But in this spark structured streaming I'm reading > > from kafka a string and using from_json on the value of that string. > > If it doesn't match my schema then I from_json returns null for all > > the rows, and does not populate a corrupt record column. But I want to > > somehow obtain the source kafka string in a dataframe, and an write to > > a output sink / parquet location. > > > > def getKafkaEventDataFrame(rawKafkaDataFrame: DataFrame, schema: > > StructType) = { > > val jsonDataFrame = rawKafkaDataFrame.select(col("value").cast("string")) > > jsonDataFrame.select(from_json(col("value"), > > schema)).select("jsontostructs(value).*") > > } - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Corrupt record handling in spark structured streaming and from_json function
>From my initial impression it looks like I'd need to create my own `from_json` using `jsonToStructs` as a reference but try to handle ` case : BadRecordException => null ` or similar to try to write the non matching string to a corrupt records column On Wed, Dec 26, 2018 at 1:55 PM Colin Williams wrote: > > Hi, > > I'm trying to figure out how I can write records that don't match a > json read schema via spark structred streaming to an output sink / > parquet location. Previously I did this in batch via corrupt column > features of batch. But in this spark structured streaming I'm reading > from kafka a string and using from_json on the value of that string. > If it doesn't match my schema then I from_json returns null for all > the rows, and does not populate a corrupt record column. But I want to > somehow obtain the source kafka string in a dataframe, and an write to > a output sink / parquet location. > > def getKafkaEventDataFrame(rawKafkaDataFrame: DataFrame, schema: StructType) > = { > val jsonDataFrame = rawKafkaDataFrame.select(col("value").cast("string")) > jsonDataFrame.select(from_json(col("value"), > schema)).select("jsontostructs(value).*") > } - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Corrupt record handling in spark structured streaming and from_json function
Hi, I'm trying to figure out how I can write records that don't match a json read schema via spark structred streaming to an output sink / parquet location. Previously I did this in batch via corrupt column features of batch. But in this spark structured streaming I'm reading from kafka a string and using from_json on the value of that string. If it doesn't match my schema then I from_json returns null for all the rows, and does not populate a corrupt record column. But I want to somehow obtain the source kafka string in a dataframe, and an write to a output sink / parquet location. def getKafkaEventDataFrame(rawKafkaDataFrame: DataFrame, schema: StructType) = { val jsonDataFrame = rawKafkaDataFrame.select(col("value").cast("string")) jsonDataFrame.select(from_json(col("value"), schema)).select("jsontostructs(value).*") } - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Packaging kafka certificates in uber jar
Hi thanks. This is part of the solution I found after writing the question. The other part being is that I needed to write the input stream to a temporary file. I would prefer not to write any temporary file but the ssl.keystore.location properties seems to expect a file path. On Tue, Dec 25, 2018 at 5:26 AM Anastasios Zouzias wrote: > > Hi Colin, > > You can place your certificates under src/main/resources and include them in > the uber JAR, see e.g. : > https://stackoverflow.com/questions/40252652/access-files-in-resources-directory-in-jar-from-apache-spark-streaming-context > > Best, > Anastasios > > On Mon, Dec 24, 2018 at 10:29 PM Colin Williams > wrote: >> >> I've been trying to read from kafka via a spark streaming client. I >> found out spark cluster doesn't have certificates deployed. Then I >> tried using the same local certificates I've been testing with by >> packing them in an uber jar and getting a File handle from the >> Classloader resource. But I'm getting a File Not Found exception. >> These are jks certificates. Is anybody aware how to package >> certificates in a jar with a kafka client preferably the spark one? >> >> - >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> > > > -- > -- Anastasios Zouzias - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Packaging kafka certificates in uber jar
I've been trying to read from kafka via a spark streaming client. I found out spark cluster doesn't have certificates deployed. Then I tried using the same local certificates I've been testing with by packing them in an uber jar and getting a File handle from the Classloader resource. But I'm getting a File Not Found exception. These are jks certificates. Is anybody aware how to package certificates in a jar with a kafka client preferably the spark one? - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Casting nested columns and updated nested struct fields.
Looks like it's been reported already. It's too bad it's been a year but should be released into spark 3: https://issues.apache.org/jira/browse/SPARK-22231 On Fri, Nov 23, 2018 at 8:42 AM Colin Williams wrote: > > Seems like it's worthy of filing a bug against withColumn > > On Wed, Nov 21, 2018, 6:25 PM Colin Williams > > >> Hello, >> >> I'm currently trying to update the schema for a dataframe with nested >> columns. I would either like to update the schema itself or cast the >> column without having to explicitly select all the columns just to >> cast one. >> >> In regards to updating the schema it looks like I would probably need >> to write a more complex map on the schema to find the StructFields I >> want to update and update them. I haven't found any examples of this >> but it seems like there should be a simpler way to do it. >> >> In regards to changing the column on the dataframe itself, using E.G. >> >> val newDF = >> df.withColumn("existing.top.level.FIELD_NAME",df.col("existing.top.level.FIELD_NAME").cast(LongType)) >> >> I end up with a new column named "existing.top.level.FIELD_NAME" at >> the root level vs updating the nested column to the new type. Then has >> anybody worked out how to both update nested column datatype and also >> how to update the column type from the nested schema StructType? Are >> there any easy ways to do this or is there a reason it is not trivial? - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Casting nested columns and updated nested struct fields.
Seems like it's worthy of filing a bug against withColumn On Wed, Nov 21, 2018, 6:25 PM Colin Williams < colin.williams.seat...@gmail.com wrote: > Hello, > > I'm currently trying to update the schema for a dataframe with nested > columns. I would either like to update the schema itself or cast the > column without having to explicitly select all the columns just to > cast one. > > In regards to updating the schema it looks like I would probably need > to write a more complex map on the schema to find the StructFields I > want to update and update them. I haven't found any examples of this > but it seems like there should be a simpler way to do it. > > In regards to changing the column on the dataframe itself, using E.G. > > val newDF = > df.withColumn("existing.top.level.FIELD_NAME",df.col("existing.top.level.FIELD_NAME").cast(LongType)) > > I end up with a new column named "existing.top.level.FIELD_NAME" at > the root level vs updating the nested column to the new type. Then has > anybody worked out how to both update nested column datatype and also > how to update the column type from the nested schema StructType? Are > there any easy ways to do this or is there a reason it is not trivial? >
Casting nested columns and updated nested struct fields.
Hello, I'm currently trying to update the schema for a dataframe with nested columns. I would either like to update the schema itself or cast the column without having to explicitly select all the columns just to cast one. In regards to updating the schema it looks like I would probably need to write a more complex map on the schema to find the StructFields I want to update and update them. I haven't found any examples of this but it seems like there should be a simpler way to do it. In regards to changing the column on the dataframe itself, using E.G. val newDF = df.withColumn("existing.top.level.FIELD_NAME",df.col("existing.top.level.FIELD_NAME").cast(LongType)) I end up with a new column named "existing.top.level.FIELD_NAME" at the root level vs updating the nested column to the new type. Then has anybody worked out how to both update nested column datatype and also how to update the column type from the nested schema StructType? Are there any easy ways to do this or is there a reason it is not trivial? - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
inferred schemas for spark streaming from a Kafka source
Does anybody know how to use inferred schemas with structured streaming: https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#schema-inference-and-partition-of-streaming-dataframesdatasets I have some code like : object StreamingApp { def launch(config: Config, spark: SparkSession): Unit = { import spark.implicits._ val schemaJson = spark.sparkContext.parallelize(List(config.schema)) val schemaDF = spark.read.json(schemaJson) schemaDF.printSchema() // read text from kafka val df = spark .readStream .format("kafka") .option("kafka.bootstrap.servers",config.broker) .option("subscribe",config.topic) .option("startingOffsets", "earliest") .load() spark.sql("set spark.sql.streaming.schemaInference=true") val jsonOptions = Map[String,String]("mode" -> "FAILFAST") val org_store_event_df = df.select( col("key").cast("string"), from_json(col("value").cast("string"), schemaDF.schema, jsonOptions)).writeStream .format("console") .start() .awaitTermination() } } I'd like to compare an inferred schema against my provided, to determine what I'm missing from my provided scheme or why I arrive with all nulls in my values column. currently I'm using a schema to read from a json file. But I'd like to infer the schema from the stream as suggested by the docs. Then not sure how to replace from_json so that the value column is read using an inferred schema, or otherwise. Maybe it's not supported for kafka streams and only for file streams? If this is the case then why the have different implementations? Also shouldn't we make the documentation more clear? - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Dataframe reader does not read microseconds, but TimestampType supports microseconds
I'm confused as to why Sparks Dataframe reader does not support reading json or similar with microsecond timestamps to microseconds, but instead reads into millis. This seems strange when the TimestampType supports microseconds. For example create a schema for a json object with a column of TimestampType. Then read data from that column with timestamps with microseconds like 2018-05-13 20:25:34.153712 2018-05-13T20:25:37.348006 You will end up with timestamps with millisecond precision. E.G. 2018-05-13 20:25:34.153 When reading about TimestampType: The data type representing java.sql.Timestamp values. Please use the singleton DataTypes.TimestampType. java.sql.timestamp provides a method that reads timestamps like Timestamp.valueOf("2018-05-13 20:25:37.348006") including milliseconds. So why does Spark's DataFrame reader drop the ball on this?
Specifying a custom Partitioner on RDD creation in Spark 2
Hi, I'm currently creating RDDs using a pattern like follows: val rdd: RDD[String] = session.sparkContext.parallelize(longkeys).flatMap( key => { logInfo(s"job at key: ${key}") Source.fromBytes(S3Util.getBytes(S3Util.getClient(region, S3Util.getCredentialsProvider("INSTANCE", "")), bucket, key)) .getLines() } ) We've been using this pattern or similar to workaround issues regarding S3 and our hadoop version. However, this same pattern could might be applied to other types of data sources, which may not have a connector. This method has been working out fairly well, but I'd like more control regarding how the data is partitioned from the start. I tried to manually partition the data without a partitioner, but got JVM exceptions regarding my Arrays being to large for the JVM. val keyList = groupedKeys.keys.toList val rdd: RDD[String] = session.sparkContext.parallelize(keyList,keyList.length).flatMap { key => logInfo(s"job at day: ${key}") val byteArrayBuffer = new ArrayBuffer[Byte]() val objectKeyList: List[String] = groupedKeys(key) objectKeyList.foreach( objectKey => { logInfo(s"working on object: ${objectKey}") byteArrayBuffer.appendAll(S3Util.getBytes(S3Util.getClient(region, S3Util.getCredentialsProvider("INSTANCE", "")), bucket, objectKey)) } ) Source.fromBytes(byteArrayBuffer.toArray[Byte]).getLines() } Then I've defined a custom partitioner based on my source data: class dayPartitioner(keys: List[String]) extends Partitioner with Logger { val keyMap: Map[String, List[String]] = keys.groupBy(_.substring(0, 10)) val partitions = keyMap.keySet.size val partitionMap: Map[String, Int] = keyMap.keys.zipWithIndex.toMap override def getPartition(key: Any): Int = { val keyString = key.asInstanceOf[String] val partitionKey = keyString.substring(0, 10) partitionMap(partitionKey) } override def numPartitions: Int = partitions } } I'd like to know do I have to create a custom RDD class to specify my RDD and use it like in the pattern above? If so I'd also like a reference regarding doing this, to hopefully save me some headaches and gotchas from a naive approach. I've found one such example https://stackoverflow.com/a/25204589 but it's from an older version of Spark. I'm hoping maybe there is something more recent and more in-depth. I don't mind references to books or otherwise. Best, Colin Williams - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: spark-sql importing schemas from catalogString or schema.toString()
val test_schema = DataType.fromJson(schema).asInstanceOf[StructType] val session = SparkHelper.getSparkSession val df1: DataFrame = session.read .format("json") .schema(test_schema) .option("inferSchema","false") .option("mode","FAILFAST") .load("src/test/resources/*.gz") df1.show(80) On Wed, Mar 28, 2018 at 5:10 PM, Colin Williams wrote: > I've had more success exporting the schema toJson and importing that. > Something like: > > > val df1: DataFrame = session.read > .format("json") > .schema(test_schema) > .option("inferSchema","false") > .option("mode","FAILFAST") > .load("src/test/resources/*.gz") > df1.show(80) > > > > On Wed, Mar 28, 2018 at 3:25 PM, Colin Williams > wrote: >> The to String representation look like where "someName" is unique: >> >> StructType(StructField("someName",StringType,true), >> StructField("someName",StructType(StructField("someName",StructType(StructField("someName",StringType,true), >> StructField("someName",StringType,true)),true), >> StructField("someName",StructType(StructField("someName",StringType,true), >> StructField("someName",StringType,true)),true), >> StructField("someName",StructType(StructField("someName",StringType,true)),true), >> StructField("someName",StructType(StructField("someName",StringType,true), >> StructField("someName",StringType,true)),true), >> StructField("someName",StructType(StructField("someName",StringType,true), >> StructField("someName",StringType,true)),true), >> StructField("someName",StructType(StructField("someName",StringType,true), >> StructField("someName",StringType,true)),true), >> StructField("someName",StructType(StructField("someName",StringType,true), >> StructField("someName",StringType,true)),true), >> StructField("someName",StructType(StructField("someName",StringType,true), >> StructField("someName",StringType,true)),true), >> StructField("someName",StructType(StructField("someName",StringType,true), >> StructField("someName",StringType,true)),true), >> StructField("someName",StructType(StructField("someName",StringType,true), >> StructField("someName",StringType,true)),true), >> StructField("someName",StructType(StructField("someName",StringType,true), >> StructField("someName",StringType,true)),true), >> StructField("someName",StructType(StructField("someName",StringType,true), >> StructField("someName",StringType,true)),true), >> StructField("someName", >> StructType(StructField("someName",StringType,true), >> StructField("someName",StringType,true)),true), >> StructField("someName",StructType(StructField("someName",StringType,true), >> StructField("someName",StringType,true)),true), >> StructField("someName", >> StructType(StructField("someName",StringType,true), >> StructField("someName",StringType,true)),true), >> StructField("someName",StructType(StructField("someName",StringType,true), >> StructField("someName",StringType, true)),true), >> StructField("someName",StructType(StructField("someName",StringType,true), >> StructField("someName",StringType,true)),true), >> StructField("someName",StructType(StructField("someName",StringType,true), >> StructField("someName",StringType,true)),true), >> StructField("someName",StructType(StructField("someName",StringType,true), >> StructField("someName",StringType,true)),true), >> StructField("someName", >> StructType(StructField("someName",StringType,true), >> StructField("someName",StringType,true)),true), >> StructField("someName",StructType(StructField("someName",StringType,true), >> StructField("someName",StringType,true)),true), >> StructField("someName", >> StructType(StructField("someName",StringType,true), >> StructField("someName",StringType,true)),true), >> StructField("someName",StructType(StructField("someName",String
Re: spark-sql importing schemas from catalogString or schema.toString()
I've had more success exporting the schema toJson and importing that. Something like: val df1: DataFrame = session.read .format("json") .schema(test_schema) .option("inferSchema","false") .option("mode","FAILFAST") .load("src/test/resources/*.gz") df1.show(80) On Wed, Mar 28, 2018 at 3:25 PM, Colin Williams wrote: > The to String representation look like where "someName" is unique: > > StructType(StructField("someName",StringType,true), > StructField("someName",StructType(StructField("someName",StructType(StructField("someName",StringType,true), > StructField("someName",StringType,true)),true), > StructField("someName",StructType(StructField("someName",StringType,true), > StructField("someName",StringType,true)),true), > StructField("someName",StructType(StructField("someName",StringType,true)),true), > StructField("someName",StructType(StructField("someName",StringType,true), > StructField("someName",StringType,true)),true), > StructField("someName",StructType(StructField("someName",StringType,true), > StructField("someName",StringType,true)),true), > StructField("someName",StructType(StructField("someName",StringType,true), > StructField("someName",StringType,true)),true), > StructField("someName",StructType(StructField("someName",StringType,true), > StructField("someName",StringType,true)),true), > StructField("someName",StructType(StructField("someName",StringType,true), > StructField("someName",StringType,true)),true), > StructField("someName",StructType(StructField("someName",StringType,true), > StructField("someName",StringType,true)),true), > StructField("someName",StructType(StructField("someName",StringType,true), > StructField("someName",StringType,true)),true), > StructField("someName",StructType(StructField("someName",StringType,true), > StructField("someName",StringType,true)),true), > StructField("someName",StructType(StructField("someName",StringType,true), > StructField("someName",StringType,true)),true), > StructField("someName", > StructType(StructField("someName",StringType,true), > StructField("someName",StringType,true)),true), > StructField("someName",StructType(StructField("someName",StringType,true), > StructField("someName",StringType,true)),true), > StructField("someName", > StructType(StructField("someName",StringType,true), > StructField("someName",StringType,true)),true), > StructField("someName",StructType(StructField("someName",StringType,true), > StructField("someName",StringType, true)),true), > StructField("someName",StructType(StructField("someName",StringType,true), > StructField("someName",StringType,true)),true), > StructField("someName",StructType(StructField("someName",StringType,true), > StructField("someName",StringType,true)),true), > StructField("someName",StructType(StructField("someName",StringType,true), > StructField("someName",StringType,true)),true), > StructField("someName", > StructType(StructField("someName",StringType,true), > StructField("someName",StringType,true)),true), > StructField("someName",StructType(StructField("someName",StringType,true), > StructField("someName",StringType,true)),true), > StructField("someName", > StructType(StructField("someName",StringType,true), > StructField("someName",StringType,true)),true), > StructField("someName",StructType(StructField("someName",StringType,true), > StructField("someName",StringType, true)),true)),true), > StructField("someName",BooleanType,true), > StructField("someName",LongType,true), > StructField("someName",StringType,true), > StructField("someName",StringType,true), > StructField("someName",StringType,true), > StructField("someName",StringType,true)) > > > The catalogString looks something like where SOME_TABLE_NAME is unique: > > struct, > > SOME_TABLE_NAME:struct,SOME_TABLE_NAME:struct, > SOME_TABLE_NAME:struct,SOME_TABLE_NAME:struct > SOME_TABLE_NAME:string>,SOME_TABLE_NAME:struct,S
Re: spark-sql importing schemas from catalogString or schema.toString()
The to String representation look like where "someName" is unique: StructType(StructField("someName",StringType,true), StructField("someName",StructType(StructField("someName",StructType(StructField("someName",StringType,true), StructField("someName",StringType,true)),true), StructField("someName",StructType(StructField("someName",StringType,true), StructField("someName",StringType,true)),true), StructField("someName",StructType(StructField("someName",StringType,true)),true), StructField("someName",StructType(StructField("someName",StringType,true), StructField("someName",StringType,true)),true), StructField("someName",StructType(StructField("someName",StringType,true), StructField("someName",StringType,true)),true), StructField("someName",StructType(StructField("someName",StringType,true), StructField("someName",StringType,true)),true), StructField("someName",StructType(StructField("someName",StringType,true), StructField("someName",StringType,true)),true), StructField("someName",StructType(StructField("someName",StringType,true), StructField("someName",StringType,true)),true), StructField("someName",StructType(StructField("someName",StringType,true), StructField("someName",StringType,true)),true), StructField("someName",StructType(StructField("someName",StringType,true), StructField("someName",StringType,true)),true), StructField("someName",StructType(StructField("someName",StringType,true), StructField("someName",StringType,true)),true), StructField("someName",StructType(StructField("someName",StringType,true), StructField("someName",StringType,true)),true), StructField("someName", StructType(StructField("someName",StringType,true), StructField("someName",StringType,true)),true), StructField("someName",StructType(StructField("someName",StringType,true), StructField("someName",StringType,true)),true), StructField("someName", StructType(StructField("someName",StringType,true), StructField("someName",StringType,true)),true), StructField("someName",StructType(StructField("someName",StringType,true), StructField("someName",StringType, true)),true), StructField("someName",StructType(StructField("someName",StringType,true), StructField("someName",StringType,true)),true), StructField("someName",StructType(StructField("someName",StringType,true), StructField("someName",StringType,true)),true), StructField("someName",StructType(StructField("someName",StringType,true), StructField("someName",StringType,true)),true), StructField("someName", StructType(StructField("someName",StringType,true), StructField("someName",StringType,true)),true), StructField("someName",StructType(StructField("someName",StringType,true), StructField("someName",StringType,true)),true), StructField("someName", StructType(StructField("someName",StringType,true), StructField("someName",StringType,true)),true), StructField("someName",StructType(StructField("someName",StringType,true), StructField("someName",StringType, true)),true)),true), StructField("someName",BooleanType,true), StructField("someName",LongType,true), StructField("someName",StringType,true), StructField("someName",StringType,true), StructField("someName",StringType,true), StructField("someName",StringType,true)) The catalogString looks something like where SOME_TABLE_NAME is unique: struct, SOME_TABLE_NAME:struct,SOME_TABLE_NAME:struct, SOME_TABLE_NAME:struct,SOME_TABLE_NAME:struct,SOME_TABLE_NAME:struct,SOME_TABLE_NAME: struct,SOME_TABLE_NAME:struct,SOME_TABLE_NAME:struct,SOME_TABLE_NAME:struct,SOME_TABLE_NAME:struct, SOME_TABLE_NAME:struct,SOME_TABLE_NAME:struct,SOME_TABLE_NAME:struct,SOME_TABLE_NAME: struct,SOME_TABLE_NAME:struct,SOME_TABLE_NAME:struct,SOME_TABLE_NAME:struct,SOME_TABLE_NAME:struct, SOME_TABLE_NAME:struct,SOME_TABLE_NAME:struct,SOME_TABLE_NAME:struct,SOME_TABLE_NAME: struct>,SOME_TABLE_NAME:boolean,SOME_TABLE_NAME:bigint, SOME_TABLE_NAME:string,SOME_TABLE_NAME:string,SOME_TABLE_NAME:string,SOME_TABLE_NAME:string> On Wed, Mar 28, 2018 at 2:32 PM, Colin Williams wrote: > I've been learning spark-sql and have been trying to export and import > some of the generated schemas to edit them. I've been writing the > schemas to strings like df1.schema.toString() and > df.schema.catalogString > > But I've been having trouble loading the schemas created. Does anyone > know if it's possible to work with the catalogString? I couldn't find > too many resources working with it. Is it possible to create a schema > from this string and load from it using the SparkSession? > > Similarly I haven't yet sucessfully loaded the toString Schema, after > some small edits... > > > There's a little tidbit about some of this here: > https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-DataType.html - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
spark-sql importing schemas from catalogString or schema.toString()
I've been learning spark-sql and have been trying to export and import some of the generated schemas to edit them. I've been writing the schemas to strings like df1.schema.toString() and df.schema.catalogString But I've been having trouble loading the schemas created. Does anyone know if it's possible to work with the catalogString? I couldn't find too many resources working with it. Is it possible to create a schema from this string and load from it using the SparkSession? Similarly I haven't yet sucessfully loaded the toString Schema, after some small edits... There's a little tidbit about some of this here: https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-DataType.html - To unsubscribe e-mail: user-unsubscr...@spark.apache.org