Re: from_json function

2018-08-16 Thread dbolshak
Maxim, thanks for your replay.

I've left comment in the following jira issue
https://issues.apache.org/jira/browse/SPARK-23194?focusedCommentId=16582025=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16582025



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: from_json function

2018-08-15 Thread Maxim Gekk
Hello Denis,

The from_json function supports only the fail fast mode, see:
https://github.com/apache/spark/blob/e2ab7deae76d3b6f41b9ad4d0ece14ea28db40ce/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala#L568

Your settings "mode" -> "PERMISSIVE" will be overwritten

On Wed, Aug 15, 2018 at 4:52 PM dbolshak  wrote:

> Hello community,
>
> I can not manage to run from_json method with "columnNameOfCorruptRecord"
> option.
> ```
> import org.apache.spark.sql.functions._
>
> val data = Seq(
>   "{'number': 1}",
>   "{'number': }"
> )
>
> val schema = new StructType()
>   .add($"number".int)
>   .add($"_corrupt_record".string)
>
> val sourceDf = data.toDF("column")
>
> val jsonedDf = sourceDf
>   .select(from_json(
> $"column",
> schema,
> Map("mode" -> "PERMISSIVE", "columnNameOfCorruptRecord" ->
> "_corrupt_record")
>   ) as "data").selectExpr("data.number", "data._corrupt_record")
>
>   jsonedDf.show()
> ```
> Does anybody can help me get `_corrupt_record` non empty?
>
> Thanks in advance.
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

-- 

Maxim Gekk

Technical Solutions Lead

Databricks Inc.

maxim.g...@databricks.com

databricks.com

  


RE: from_json()

2017-08-30 Thread JG Perrin
Hey Sam,

Nope – it does not work the way I want. I guess it is only working with one 
type…

Trying to convert:
{"releaseDate":147944880,"link":"http://amzn.to/2kup94P","id":1,"authorId":1,"title":"Fantastic
 Beasts and Where to Find Them: The Original Screenplay"}

I get:
[Executor task launch worker for task 3:ERROR] Logging$class: Exception in task 
0.0 in stage 3.0 (TID 3)
java.lang.IllegalArgumentException: Failed to convert the JSON string 
'{"releaseDate":147944880,"link":"http://amzn.to/2kup94P","id":1,"authorId":1,"title":"Fantastic
 Beasts and Where to Find Them: The Original Screenplay"}' to a data type.
   at org.apache.spark.sql.types.DataType$.parseDataType(DataType.scala:176)
   at org.apache.spark.sql.types.DataType$.fromJson(DataType.scala:108)
   at org.apache.spark.sql.types.DataType.fromJson(DataType.scala)
   at 
net.jgp.labs.spark.l250_map.l031_dataset_book_json_in_progress.CsvToDatasetBookAsJson$BookMapper.call(CsvToDatasetBookAsJson.java:44)
   at 
net.jgp.labs.spark.l250_map.l031_dataset_book_json_in_progress.CsvToDatasetBookAsJson$BookMapper.call(CsvToDatasetBookAsJson.java:1)
   at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
 Source)
   at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
   at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
   at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:234)
   at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228)
   at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
   at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
   at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
   at org.apache.spark.scheduler.Task.run(Task.scala:108)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
   at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
   at java.lang.Thread.run(Unknown Source)




From: JG Perrin [mailto:jper...@lumeris.com]
Sent: Monday, August 28, 2017 1:29 PM
To: Sam Elamin <hussam.ela...@gmail.com>
Cc: user@spark.apache.org
Subject: RE: from_json()

Thanks Sam – this might be the solution. I will investigate!

From: Sam Elamin [mailto:hussam.ela...@gmail.com]
Sent: Monday, August 28, 2017 1:14 PM
To: JG Perrin <jper...@lumeris.com<mailto:jper...@lumeris.com>>
Cc: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: from_json()

Hi jg,

Perhaps I am misunderstanding you, but if you just want to create a new schema 
from a df its fairly simple, assuming you have a schema already predefined or 
in a string. i.e.

val newSchema = DataType.fromJson(json_schema_string)

then all you need to do is re-create the dataframe using this new dataframe

sqlContext.createDataFrame(oldDF.rdd,newSchema)
Regards
Sam

On Mon, Aug 28, 2017 at 5:57 PM, JG Perrin 
<jper...@lumeris.com<mailto:jper...@lumeris.com>> wrote:
Is there a way to not have to specify a schema when using from_json() or infer 
the schema? When you read a JSON doc from disk, you can infer the schema. 
Should I write it to disk before (ouch)?

jg


This electronic transmission and any documents accompanying this electronic 
transmission contain confidential information belonging to the sender. This 
information may contain confidential health information that is legally 
privileged. The information is intended only for the use of the individual or 
entity named above. The authorized recipient of this transmission is prohibited 
from disclosing this information to any other party unless required to do so by 
law or regulation and is required to delete or destroy the information after 
its stated need has been fulfilled. If you are not the intended recipient, you 
are hereby notified that any disclosure, copying, distribution or the taking of 
any action in reliance on or regarding the contents of this electronically 
transmitted information is strictly prohibited. If you have received this 
E-mail in error, please notify the sender and delete this message immediately.



RE: from_json()

2017-08-28 Thread JG Perrin
Thanks Sam – this might be the solution. I will investigate!

From: Sam Elamin [mailto:hussam.ela...@gmail.com]
Sent: Monday, August 28, 2017 1:14 PM
To: JG Perrin <jper...@lumeris.com>
Cc: user@spark.apache.org
Subject: Re: from_json()

Hi jg,

Perhaps I am misunderstanding you, but if you just want to create a new schema 
from a df its fairly simple, assuming you have a schema already predefined or 
in a string. i.e.

val newSchema = DataType.fromJson(json_schema_string)

then all you need to do is re-create the dataframe using this new dataframe

sqlContext.createDataFrame(oldDF.rdd,newSchema)
Regards
Sam

On Mon, Aug 28, 2017 at 5:57 PM, JG Perrin 
<jper...@lumeris.com<mailto:jper...@lumeris.com>> wrote:
Is there a way to not have to specify a schema when using from_json() or infer 
the schema? When you read a JSON doc from disk, you can infer the schema. 
Should I write it to disk before (ouch)?

jg


This electronic transmission and any documents accompanying this electronic 
transmission contain confidential information belonging to the sender. This 
information may contain confidential health information that is legally 
privileged. The information is intended only for the use of the individual or 
entity named above. The authorized recipient of this transmission is prohibited 
from disclosing this information to any other party unless required to do so by 
law or regulation and is required to delete or destroy the information after 
its stated need has been fulfilled. If you are not the intended recipient, you 
are hereby notified that any disclosure, copying, distribution or the taking of 
any action in reliance on or regarding the contents of this electronically 
transmitted information is strictly prohibited. If you have received this 
E-mail in error, please notify the sender and delete this message immediately.



Re: from_json()

2017-08-28 Thread Sam Elamin
Hi jg,

Perhaps I am misunderstanding you, but if you just want to create a new
schema from a df its fairly simple, assuming you have a schema already
predefined or in a string. i.e.

val newSchema = DataType.fromJson(json_schema_string)

then all you need to do is re-create the dataframe using this new dataframe

sqlContext.createDataFrame(oldDF.rdd,newSchema)

Regards
Sam

On Mon, Aug 28, 2017 at 5:57 PM, JG Perrin  wrote:

> Is there a way to not have to specify a schema when using from_json() or
> infer the schema? When you read a JSON doc from disk, you can infer the
> schema. Should I write it to disk before (ouch)?
>
>
>
> jg
> --
>
> This electronic transmission and any documents accompanying this
> electronic transmission contain confidential information belonging to the
> sender. This information may contain confidential health information that
> is legally privileged. The information is intended only for the use of the
> individual or entity named above. The authorized recipient of this
> transmission is prohibited from disclosing this information to any other
> party unless required to do so by law or regulation and is required to
> delete or destroy the information after its stated need has been fulfilled.
> If you are not the intended recipient, you are hereby notified that any
> disclosure, copying, distribution or the taking of any action in reliance
> on or regarding the contents of this electronically transmitted information
> is strictly prohibited. If you have received this E-mail in error, please
> notify the sender and delete this message immediately.
>