Re: from_json function

2018-08-16 Thread dbolshak
Maxim, thanks for your replay.

I've left comment in the following jira issue
https://issues.apache.org/jira/browse/SPARK-23194?focusedCommentId=16582025=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16582025



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



from_json function

2018-08-15 Thread dbolshak
Hello community,

I can not manage to run from_json method with "columnNameOfCorruptRecord"
option.
```
import org.apache.spark.sql.functions._

val data = Seq(
  "{'number': 1}",
  "{'number': }"
)

val schema = new StructType()
  .add($"number".int)
  .add($"_corrupt_record".string)

val sourceDf = data.toDF("column")

val jsonedDf = sourceDf
  .select(from_json(
$"column",
schema,
Map("mode" -> "PERMISSIVE", "columnNameOfCorruptRecord" ->
"_corrupt_record")
  ) as "data").selectExpr("data.number", "data._corrupt_record")

  jsonedDf.show()
```
Does anybody can help me get `_corrupt_record` non empty?

Thanks in advance.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Pasting into spark-shell doesn't work for Databricks example

2016-11-22 Thread dbolshak
Hello,

We have the same issue,

We use latest release 2.0.2.

Setup with 1.6.1 works fine.

Could somebody provide a workaround how to fix that?

Kind regards,
Denis



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Pasting-into-spark-shell-doesn-t-work-for-Databricks-example-tp28113p28116.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



spark with kerberos

2016-10-13 Thread dbolshak
Hello community,

We've a challenge and no ideas how to solve it.

The problem,

Say we have the following environment:
1. `cluster A`, the cluster does not use kerberos and we use it as a source
of data, important thing is - we don't manage this cluster. 
2. `cluster B`, small cluster where our spark application is running and
performing some logic. (we manage this cluster and it does not have
kerberos).
3. `cluster C`, the cluster uses kerberos and we use it to keep results of
our spark application, we manage this cluster

Our requrements and conditions that are not mentioned yet:
1. All clusters are in a single data center, but in the different
subnetworks.
2. We cannot turn on kerberos on `cluster A`
3. We cannot turn off kerberos on `cluster C`
4. We can turn on/off kerberos on `cluster B`, currently it's turned off.
5. Spark app is built on top of RDD and does not depend on spark-sql.

Does anybody know how to write data using RDD api to remote cluster which is
running with Kerberos?

-- 
//with Best Regards
--Denis Bolshakov
e-mail: bolshakov.de...@gmail.com



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-with-kerberos-tp27894.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org