[Spark] How to find which type of key is illegal during from_json() function

2023-03-08 Thread hueiyuan su
*Component*: Spark *Level*: Advanced *Scenario*: How-to - *Problems Description* I have nested json string value in someone field of spark dataframe, and I would like to use from_json() to parse json object. Especially, if one of key type is not match with our defined

Re: Corrupt record handling in spark structured streaming and from_json function

2018-12-31 Thread Colin Williams
ams > wrote: > > > > From my initial impression it looks like I'd need to create my own > > `from_json` using `jsonToStructs` as a reference but try to handle ` > > case : BadRecordException => null ` or similar to try to write the non > > matching string to a cor

Re: Corrupt record handling in spark structured streaming and from_json function

2018-12-26 Thread Colin Williams
https://stackoverflow.com/questions/53938967/writing-corrupt-data-from-kafka-json-datasource-in-spark-structured-streaming On Wed, Dec 26, 2018 at 2:42 PM Colin Williams wrote: > > From my initial impression it looks like I'd need to create my own > `from_json` using `json

Re: Corrupt record handling in spark structured streaming and from_json function

2018-12-26 Thread Colin Williams
>From my initial impression it looks like I'd need to create my own `from_json` using `jsonToStructs` as a reference but try to handle ` case : BadRecordException => null ` or similar to try to write the non matching string to a corrupt records column On Wed, Dec 26, 2018 at 1:55 PM

Corrupt record handling in spark structured streaming and from_json function

2018-12-26 Thread Colin Williams
and using from_json on the value of that string. If it doesn't match my schema then I from_json returns null for all the rows, and does not populate a corrupt record column. But I want to somehow obtain the source kafka string in a dataframe, and an write to a output sink / parquet location. def

Re: from_json function

2018-08-16 Thread dbolshak
Maxim, thanks for your replay. I've left comment in the following jira issue https://issues.apache.org/jira/browse/SPARK-23194?focusedCommentId=16582025=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16582025 -- Sent from:

from_json schema order

2018-08-15 Thread Brandon Geise
Hi, Can someone confirm whether ordering matters between the schema and underlying JSON string? Thanks, Brandon

Re: from_json function

2018-08-15 Thread Maxim Gekk
Hello Denis, The from_json function supports only the fail fast mode, see: https://github.com/apache/spark/blob/e2ab7deae76d3b6f41b9ad4d0ece14ea28db40ce/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala#L568 Your settings "mode" ->

from_json function

2018-08-15 Thread dbolshak
Hello community, I can not manage to run from_json method with "columnNameOfCorruptRecord" option. ``` import org.apache.spark.sql.functions._ val data = Seq( "{'number': 1}", "{'number': }" ) val schema = new StructType()

Re: How to handle java.sql.Date inside Maps with to_json / from_json

2018-06-28 Thread Patrick McGloin
jsonDF = mapDF.select(*to_json*(*struct*(mapDF.columns.head, > mapDF.columns.tail:_*), options)) > jsonDF.show() > > *val *jsonString = jsonDF.map(_.getString(0)).collect().head > > *val *stringDF = *Seq*(jsonString).toDF(*"json"*) > *val *parsedD

How to handle java.sql.Date inside Maps with to_json / from_json

2018-06-28 Thread Patrick McGloin
to_json*(*struct*(mapDF.columns.head, mapDF.columns.tail:_*), options)) jsonDF.show() *val *jsonString = jsonDF.map(_.getString(0)).collect().head *val *stringDF = *Seq*(jsonString).toDF(*"json"*) *val *parsedDF = stringDF.select(*from_json*(*$"json"*, sc

Re: pyspark + from_json(col("col_name"), schema) returns all null

2017-12-10 Thread Jacek Laskowski
Hi, Not that I'm aware of, but in your case checking out whether a JSON message fit your schema and the pipeline would've taken pyspark alone with JSONs on disk, wouldn't it? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Spark Structured Streaming

Re: pyspark + from_json(col("col_name"), schema) returns all null

2017-12-10 Thread salemi
I found the root cause! There was mismatch between the StructField type and the json message. Is there a good write up / wiki out there that describes how to debug spark jobs? Thanks -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

pyspark + from_json(col("col_name"), schema) returns all null

2017-12-09 Thread salemi
Hi All, I am using pyspark and consuming messages from Kafka and when I .select(from_json(col("col_name"), schema)) the return values are all null. I looked at the json messages and they are valid strings. any ideas? -- Sent from: http://apache-spark-user-list.1001560.n3.

RE: from_json()

2017-08-30 Thread JG Perrin
apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) From: JG Perrin [mailto:jper...@

RE: from_json()

2017-08-28 Thread JG Perrin
Thanks Sam – this might be the solution. I will investigate! From: Sam Elamin [mailto:hussam.ela...@gmail.com] Sent: Monday, August 28, 2017 1:14 PM To: JG Perrin <jper...@lumeris.com> Cc: user@spark.apache.org Subject: Re: from_json() Hi jg, Perhaps I am misunderstanding you, but if yo

Re: from_json()

2017-08-28 Thread Sam Elamin
this new dataframe sqlContext.createDataFrame(oldDF.rdd,newSchema) Regards Sam On Mon, Aug 28, 2017 at 5:57 PM, JG Perrin <jper...@lumeris.com> wrote: > Is there a way to not have to specify a schema when using from_json() or > infer the schema? When you read a JSON doc from disk, y

from_json()

2017-08-28 Thread JG Perrin
Is there a way to not have to specify a schema when using from_json() or infer the schema? When you read a JSON doc from disk, you can infer the schema. Should I write it to disk before (ouch)? jg __ This electronic