Some more info:
It seems this is caused due to complex data structure.
Consider the following simple example:
case class A(v: Int)
case class B(v: A)
val filename = "test"
val a = A(1)
val b = B(a)
val df1: DataFrame = Seq[B](b).toDF
df1.write.parquet(filename)
val df2 = spark.read.parquet(filename)
df2.show()
Any ideas?
Thanks,
Assaf.
From: Mendelson, Assaf [mailto:assaf.mendel...@rsa.com]
Sent: Thursday, May 25, 2017 9:55 AM
To: user@spark.apache.org
Subject: strange warning
Hi all,
Today, I got the following warning:
[WARN] org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize
counter due to context is not a instance of TaskInputOutputContext, but is
org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
This occurs on one of my tests but not on others (all use parquet). I found
this https://issues.apache.org/jira/browse/PARQUET-220 but I am using spark
2.1.0 which uses parquet 1.8 if I am not mistaken. I also found this:
https://issues.apache.org/jira/browse/SPARK-8118 but again, it is very old.
Also it only happens on one case where I save my parquet files and not others.
Does anyone know what it means and how to get rid of it?
Thanks,
Assaf.