[ 
https://issues.apache.org/jira/browse/SPARK-35320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17345782#comment-17345782
 ] 

Pablo Langa Blanco commented on SPARK-35320:
--------------------------------------------

I'm taking a look at this

> from_json cannot parse maps with timestamp as key
> -------------------------------------------------
>
>                 Key: SPARK-35320
>                 URL: https://issues.apache.org/jira/browse/SPARK-35320
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.0.1, 3.1.1
>         Environment: * Java 11
>  * Spark 3.0.1/3.1.1
>  * Scala 2.12
>            Reporter: Vincenzo Cerminara
>            Priority: Minor
>
> I have a json that contains a {{map<timestamp,string>}} like the following
> {code:json}
> {
>   "map": {
>     "2021-05-05T20:05:08": "sampleValue"
>   }
> }
> {code}
> The key of the map is a string containing a formatted timestamp and I want to 
> parse it as a Java {{{{Map<Instant,String>}}}} using the {{{{from_json}}}} 
> Spark SQL function (see the {{Sample}} class in the code below).
> {code:java}
> import org.apache.spark.sql.Dataset;
> import org.apache.spark.sql.Encoders;
> import org.apache.spark.sql.Row;
> import org.apache.spark.sql.SparkSession;
> import java.io.Serializable;
> import java.time.Instant;
> import java.util.List;
> import java.util.Map;
> import static org.apache.spark.sql.functions.*;
> public class TimestampAsJsonMapKey {
>     public static class Sample implements Serializable {
>         private Map<Instant, String> map;
>         
>         public Map<Instant, String> getMap() {
>             return map;
>         }
>         
>         public void setMap(Map<Instant, String> map) {
>             this.map = map;
>         }
>     }
>     public static class InvertedSample implements Serializable {
>         private Map<String, Instant> map;
>         
>         public Map<String, Instant> getMap() {
>             return map;
>         }
>         
>         public void setMap(Map<String, Instant> map) {
>             this.map = map;
>         }
>     }
>     public static void main(String[] args) {
>         final SparkSession spark = SparkSession
>                 .builder()
>                 .appName("Timestamp As Json Map Key Test")
>                 .master("local[1]")
>                 .getOrCreate();
>         workingTest(spark);
>         notWorkingTest(spark);
>     }
>     private static void workingTest(SparkSession spark) {
>         //language=JSON
>         final String invertedSampleJson = "{ \"map\": { \"sampleValue\": 
> \"2021-05-05T20:05:08\" } }";
>         final Dataset<String> samplesDf = 
> spark.createDataset(List.of(invertedSampleJson), Encoders.STRING());
>         final Dataset<Row> parsedDf = 
> samplesDf.select(from_json(col("value"), 
> Encoders.bean(InvertedSample.class).schema()));
>         parsedDf.show(false);
>     }
>     private static void notWorkingTest(SparkSession spark) {
>         //language=JSON
>         final String sampleJson = "{ \"map\": { \"2021-05-05T20:05:08\": 
> \"sampleValue\" } }";
>         final Dataset<String> samplesDf = 
> spark.createDataset(List.of(sampleJson), Encoders.STRING());
>         final Dataset<Row> parsedDf = 
> samplesDf.select(from_json(col("value"), 
> Encoders.bean(Sample.class).schema()));
>         parsedDf.show(false);
>     }
> }
> {code}
> When I run the {{notWorkingTest}} method it fails with the following 
> exception:
> {noformat}
> Exception in thread "main" java.lang.ClassCastException: class 
> org.apache.spark.unsafe.types.UTF8String cannot be cast to class 
> java.lang.Long (org.apache.spark.unsafe.types.UTF8String is in unnamed module 
> of loader 'app'; java.lang.Long is in module java.base of loader 'bootstrap')
>       at scala.runtime.BoxesRunTime.unboxToLong(BoxesRunTime.java:107)
>       at 
> org.apache.spark.sql.catalyst.expressions.CastBase.$anonfun$castToString$8$adapted(Cast.scala:297)
>       at 
> org.apache.spark.sql.catalyst.expressions.CastBase.buildCast(Cast.scala:285)
>       at 
> org.apache.spark.sql.catalyst.expressions.CastBase.$anonfun$castToString$7(Cast.scala:297)
>       at 
> org.apache.spark.sql.catalyst.expressions.CastBase.$anonfun$castToString$12(Cast.scala:329)
>       at 
> org.apache.spark.sql.catalyst.expressions.CastBase.buildCast(Cast.scala:285)
>       at 
> org.apache.spark.sql.catalyst.expressions.CastBase.$anonfun$castToString$11(Cast.scala:321)
>       at 
> org.apache.spark.sql.catalyst.expressions.CastBase.$anonfun$castToString$14(Cast.scala:359)
>       at 
> org.apache.spark.sql.catalyst.expressions.CastBase.buildCast(Cast.scala:285)
>       at 
> org.apache.spark.sql.catalyst.expressions.CastBase.$anonfun$castToString$13(Cast.scala:352)
>       at 
> org.apache.spark.sql.catalyst.expressions.CastBase.nullSafeEval(Cast.scala:815)
>       at 
> org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:461)
>       at 
> org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:156)
>       at 
> org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(InterpretedMutableProjection.scala:83)
>       at 
> org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation$$anonfun$apply$17.$anonfun$applyOrElse$71(Optimizer.scala:1508)
> {noformat}
> It seems that if the a {{timestamp}} is the key in a map it must necessarily 
> be a of type long, and cannot be of type {{string}}.
>  
> ----
>  In the {{workingTest}} method, instead, I have an inverted map (the 
> timestamp appears as the value in this case, and not as the key) and it works 
> correctly



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to