RE: get corrupted rows using columnNameOfCorruptRecord

2016-12-12 Thread Yehuda Finkelstein
*Cc:* Michael Armbrust; user *Subject:* Re: get corrupted rows using columnNameOfCorruptRecord Let me please just extend the suggestion a bit more verbosely. I think you could try something like this maybe. val jsonDF = spark.read .option("columnNameOfCorruptRecord", "xxx&quo

Re: get corrupted rows using columnNameOfCorruptRecord

2016-12-07 Thread Hyukjin Kwon
lyst.analysis.Analyzer. > checkAnalysis(Analyzer.scala:58) > > at org.apache.spark.sql.execution.QueryExecution. > assertAnalyzed(QueryExecution.scala:49) > > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64) > > at org.apache.spark.sql.Dataset.org$apache$spark$sql$D

RE: get corrupted rows using columnNameOfCorruptRecord

2016-12-07 Thread Yehuda Finkelstein
0:26 PM *To:* Yehuda Finkelstein *Cc:* user *Subject:* Re: get corrupted rows using columnNameOfCorruptRecord .where("xxx IS NOT NULL") will give you the rows that couldn't be parsed. On Tue, Dec 6, 2016 at 6:31 AM, Yehuda Finkelstein < yeh...@veracity-group.com> wrote: Hi

Re: get corrupted rows using columnNameOfCorruptRecord

2016-12-06 Thread Michael Armbrust
.where("xxx IS NOT NULL") will give you the rows that couldn't be parsed. On Tue, Dec 6, 2016 at 6:31 AM, Yehuda Finkelstein < yeh...@veracity-group.com> wrote: > Hi all > > > > I’m trying to parse json using existing schema and got rows with NULL’s > > //get schema > > val df_schema =