[ https://issues.apache.org/jira/browse/SPARK-21402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tom updated SPARK-21402: ------------------------ Description: I have the following schema in a dataset - root |-- userId: string (nullable = true) |-- data: map (nullable = true) | |-- key: string | |-- value: struct (valueContainsNull = true) | | |-- startTime: long (nullable = true) | | |-- endTime: long (nullable = true) |-- offset: long (nullable = true) And I have the following classes (+ setter and getters which I omitted for simplicity) - {code:java} public class MyClass { private String userId; private Map<String, MyDTO> data; private Long offset; } public class MyDTO { private long startTime; private long endTime; } {code} I collect the result the following way - {code:java} Encoder<MyClass> myClassEncoder = Encoders.bean(MyClass.class); Dataset<MyClass> results = raw_df.as(myClassEncoder); List<MyClass> lst = results.collectAsList(); {code} I do several calculations to get the result I want and the result is correct all through the way before I collect it. This is the result for - {code:java} results.select(results.col("data").getField("2017-07-01").getField("startTime")).show(false); {code} |data[2017-07-01].startTime|data[2017-07-01].endTime| +-----------------------------+--------------+ |1498854000 |1498870800 | This is the result after collecting the reuslts for - {code:java} MyClass userData = results.collectAsList().get(0); MyDTO userDTO = userData.getData().get("2017-07-01"); System.out.println("userDTO startTime: " + userDTO.getSleepStartTime()); System.out.println("userDTO endTime: " + userDTO.getSleepEndTime()); {code} -- data startTime: 1498870800 data endTime: 1498854000 I tend to believe it is a spark issue. Would love any suggestions on how to bypass it. was: I have the following schema in a dataset - root |-- userId: string (nullable = true) |-- data: map (nullable = true) | |-- key: string | |-- value: struct (valueContainsNull = true) | | |-- startTime: long (nullable = true) | | |-- endTime: long (nullable = true) |-- offset: long (nullable = true) And I have the following classes (+ setter and getters which I omitted for simplicity) - public class MyClass { private String userId; private Map<String, MyDTO> data; private Long offset; } public class MyDTO { private long startTime; private long endTime; } I collect the result the following way - Encoder<MyClass> myClassEncoder = Encoders.bean(MyClass.class); Dataset<MyClass> results = raw_df.as(myClassEncoder); List<MyClass> lst = results.collectAsList(); I do several calculations to get the result I want and the result is correct all through the way before I collect it. This is the result for - results.select(results.col("data").getField("2017-07-01").getField("startTime")).show(false); |data[2017-07-01].startTime|data[2017-07-01].endTime| +------------------------------------+--------------+ |1498854000 |1498870800 | This is the result after collecting the reuslts for - MyClass userData = results.collectAsList().get(0); MyDTO userDTO = userData.getData().get("2017-07-01"); System.out.println("userDTO startTime: " + userDTO.getSleepStartTime()); System.out.println("userDTO endTime: " + userDTO.getSleepEndTime()); -- data startTime: 1498870800 data endTime: 1498854000 I tend to believe it is a spark issue. Would love any suggestions on how to bypass it. > Java encoders - switch fields on collectAsList > ---------------------------------------------- > > Key: SPARK-21402 > URL: https://issues.apache.org/jira/browse/SPARK-21402 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.1.1 > Environment: mac os > spark 2.1.1 > Using Scala version 2.11.8, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_121 > Reporter: Tom > Priority: Minor > > I have the following schema in a dataset - > root > |-- userId: string (nullable = true) > |-- data: map (nullable = true) > | |-- key: string > | |-- value: struct (valueContainsNull = true) > | | |-- startTime: long (nullable = true) > | | |-- endTime: long (nullable = true) > |-- offset: long (nullable = true) > And I have the following classes (+ setter and getters which I omitted for > simplicity) - > > {code:java} > public class MyClass { > private String userId; > private Map<String, MyDTO> data; > private Long offset; > } > public class MyDTO { > private long startTime; > private long endTime; > } > {code} > I collect the result the following way - > {code:java} > Encoder<MyClass> myClassEncoder = Encoders.bean(MyClass.class); > Dataset<MyClass> results = raw_df.as(myClassEncoder); > List<MyClass> lst = results.collectAsList(); > {code} > > I do several calculations to get the result I want and the result is correct > all through the way before I collect it. > This is the result for - > {code:java} > results.select(results.col("data").getField("2017-07-01").getField("startTime")).show(false); > {code} > |data[2017-07-01].startTime|data[2017-07-01].endTime| > +-----------------------------+--------------+ > |1498854000 |1498870800 | > This is the result after collecting the reuslts for - > {code:java} > MyClass userData = results.collectAsList().get(0); > MyDTO userDTO = userData.getData().get("2017-07-01"); > System.out.println("userDTO startTime: " + userDTO.getSleepStartTime()); > System.out.println("userDTO endTime: " + userDTO.getSleepEndTime()); > {code} > -- > data startTime: 1498870800 > data endTime: 1498854000 > I tend to believe it is a spark issue. Would love any suggestions on how to > bypass it. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org