[jira] [Updated] (SPARK-22459) EdgeDirection "Either" Does Not Considerate Real "Either" Direction

2017-11-06 Thread Tom (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom updated SPARK-22459:

Description: 
When running functions involving  
[EdgeDirection|https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/EdgeDirection.scala],
 the _EdgeDirection.Either_ type does not truly perform the actual specified 
direction (_either_).
For instance, when using _Pregel API_ with a simple graph as shown:
!http://www.icodeguru.com/CPP/10book/books/book9/images/fig6_1.gif!
With _EdgeDirection.Either_ you would guess that vertex #3 will send a message 
(when activated) to vertices 1, 2, and 4, but in reality it does not.
This might be bypassed, but in an expansive and ineffective way. 
Tested with 2.2.0, 2.1.1 and 2.1.2 spark versions.

  was:
When running functions involving  
[EdgeDirection|https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/EdgeDirection.scala],
 the _EdgeDirection.Either_ type does not truly perform the actual specified 
direction (_either_).
For instance, when using _Pregel API_ with a simple graph as shown:
!http://www.icodeguru.com/CPP/10book/books/book9/images/fig6_1.gif|thumbnail!
With _EdgeDirection.Either_ you would guess that vertex #3 will send a message 
(when activated) to vertices 1, 2, and 4, but in reality it does not.
This might be bypassed, but in an expansive and ineffective way. 
Tested with 2.2.0, 2.1.1 and 2.1.2 spark versions.


> EdgeDirection "Either" Does Not Considerate Real "Either" Direction
> ---
>
> Key: SPARK-22459
> URL: https://issues.apache.org/jira/browse/SPARK-22459
> Project: Spark
>  Issue Type: Bug
>  Components: GraphX
>Affects Versions: 2.1.1, 2.1.2, 2.2.0
> Environment: Windows 7. Yarn cluster / client mode. Tested with 
> 2.2.0, 2.1.1 and 2.1.2 spark versions.
>Reporter: Tom
>
> When running functions involving  
> [EdgeDirection|https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/EdgeDirection.scala],
>  the _EdgeDirection.Either_ type does not truly perform the actual specified 
> direction (_either_).
> For instance, when using _Pregel API_ with a simple graph as shown:
> !http://www.icodeguru.com/CPP/10book/books/book9/images/fig6_1.gif!
> With _EdgeDirection.Either_ you would guess that vertex #3 will send a 
> message (when activated) to vertices 1, 2, and 4, but in reality it does not.
> This might be bypassed, but in an expansive and ineffective way. 
> Tested with 2.2.0, 2.1.1 and 2.1.2 spark versions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-22459) EdgeDirection "Either" Does Not Considerate Real "Either" Direction

2017-11-06 Thread Tom (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom updated SPARK-22459:

Description: 
When running functions involving  
[EdgeDirection|https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/EdgeDirection.scala],
 the _EdgeDirection.Either_ type does not truly perform the actual specified 
direction (_either_).
For instance, when using _Pregel API_ with a simple graph as shown:
!http://www.icodeguru.com/CPP/10book/books/book9/images/fig6_1.gif|thumbnail!
With _EdgeDirection.Either_ you would guess that vertex #3 will send a message 
(when activated) to vertices 1, 2, and 4, but in reality it does not.
This might be bypassed, but in an expansive and ineffective way. 
Tested with 2.2.0, 2.1.1 and 2.1.2 spark versions.

  was:
When running functions involving  
[EdgeDirection|https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/EdgeDirection.scala],
 the _EdgeDirection.Either_ type does not truly perform the actual specified 
direction (_either_).
For instance, when using _Pregel API_ with a simple graph as shown:
!http://www.icodeguru.com/CPP/10book/books/book9/images/fig6_1.gif!
With _EdgeDirection.Either_ you would guess that vertex #3 will send a message 
(when activated) to vertices 1, 2, and 4, but in reality it does not.
This might be bypassed, but in an expansive and ineffective way. 
Tested with 2.2.0, 2.1.1 and 2.1.2 spark versions.


> EdgeDirection "Either" Does Not Considerate Real "Either" Direction
> ---
>
> Key: SPARK-22459
> URL: https://issues.apache.org/jira/browse/SPARK-22459
> Project: Spark
>  Issue Type: Bug
>  Components: GraphX
>Affects Versions: 2.1.1, 2.1.2, 2.2.0
> Environment: Windows 7. Yarn cluster / client mode. Tested with 
> 2.2.0, 2.1.1 and 2.1.2 spark versions.
>Reporter: Tom
>
> When running functions involving  
> [EdgeDirection|https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/EdgeDirection.scala],
>  the _EdgeDirection.Either_ type does not truly perform the actual specified 
> direction (_either_).
> For instance, when using _Pregel API_ with a simple graph as shown:
> !http://www.icodeguru.com/CPP/10book/books/book9/images/fig6_1.gif|thumbnail!
> With _EdgeDirection.Either_ you would guess that vertex #3 will send a 
> message (when activated) to vertices 1, 2, and 4, but in reality it does not.
> This might be bypassed, but in an expansive and ineffective way. 
> Tested with 2.2.0, 2.1.1 and 2.1.2 spark versions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-22459) EdgeDirection "Either" Does Not Considerate Real "Either" Direction

2017-11-06 Thread Tom (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom updated SPARK-22459:

Description: 
When running functions involving  
[EdgeDirection|https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/EdgeDirection.scala],
 the _EdgeDirection.Either_ type does not truly perform the actual specified 
direction (_either_).
For instance, when using _Pregel API_ with a simple graph as shown:
!http://www.icodeguru.com/CPP/10book/books/book9/images/fig6_1.gif!
With _EdgeDirection.Either_ you would guess that vertex #3 will send a message 
(when activated) to vertices 1, 2, and 4, but in reality it does not.
This might be bypassed, but in an expansive and ineffective way. 
Tested with 2.2.0, 2.1.1 and 2.1.2 spark versions.

  was:
When running functions involving  
[EdgeDirection|https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/EdgeDirection.scala],
 the _EdgeDirection.Either_ type does not truly perform the actual specified 
direction (_either_).
For instance, when using _Pregel API_ with a simple graph as shown:
!

[jira] [Created] (SPARK-22459) EdgeDirection "Either" Does Not Considerate real "either" direction

2017-11-06 Thread Tom (JIRA)
Tom created SPARK-22459:
---

 Summary: EdgeDirection "Either" Does Not Considerate real "either" 
direction
 Key: SPARK-22459
 URL: https://issues.apache.org/jira/browse/SPARK-22459
 Project: Spark
  Issue Type: Bug
  Components: GraphX
Affects Versions: 2.2.0, 2.1.2, 2.1.1
 Environment: Windows 7. Yarn cluster / client mode. Tested with 2.2.0, 
2.1.1 and 2.1.2 spark versions.
Reporter: Tom


When running functions involving  
[EdgeDirection|https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/EdgeDirection.scala],
 the _EdgeDirection.Either_ type does not truly perform the actual specified 
direction (_either_).
For instance, when using _Pregel API_ with a simple graph as shown:
!

[jira] [Commented] (SPARK-21402) Java encoders - switch fields on collectAsList

2017-08-04 Thread Tom (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16114179#comment-16114179
 ] 

Tom commented on SPARK-21402:
-

Additional comment when there are multiple datatypes which are not easily 
convert to one other (e.g double <-> string) the system get crazy (as well the 
instruction in the error message didn't help) - 

{code:bash}
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGBUS (0xa) at pc=0x00011be827bc, pid=992, tid=0x1c03
#
# JRE version: Java(TM) SE Runtime Environment (8.0_121-b13) (build 
1.8.0_121-b13)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.121-b13 mixed mode bsd-amd64 
compressed oops)
# Problematic frame:
# v  ~StubRoutines::jlong_disjoint_arraycopy
#
# Failed to write core dump. Core dumps have been disabled. To enable core 
dumping, try "ulimit -c unlimited" before starting Java again
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#
{code}



> Java encoders - switch fields on collectAsList
> --
>
> Key: SPARK-21402
> URL: https://issues.apache.org/jira/browse/SPARK-21402
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1
> Environment: mac os
> spark 2.1.1
> Using Scala version 2.11.8, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_121
>Reporter: Tom
>
> I have the following schema in a dataset -
> root
>  |-- userId: string (nullable = true)
>  |-- data: map (nullable = true)
>  ||-- key: string
>  ||-- value: struct (valueContainsNull = true)
>  |||-- startTime: long (nullable = true)
>  |||-- endTime: long (nullable = true)
>  |-- offset: long (nullable = true)
>  And I have the following classes (+ setter and getters which I omitted for 
> simplicity) -
>  
> {code:java}
> public class MyClass {
> private String userId;
> private Map data;
> private Long offset;
>  }
> public class MyDTO {
> private long startTime;
> private long endTime;
> }
> {code}
> I collect the result the following way - 
> {code:java}
> Encoder myClassEncoder = Encoders.bean(MyClass.class);
> Dataset results = raw_df.as(myClassEncoder);
> List lst = results.collectAsList();
> {code}
> 
> I do several calculations to get the result I want and the result is correct 
> all through the way before I collect it.
> This is the result for - 
> {code:java}
> results.select(results.col("data").getField("2017-07-01").getField("startTime")).show(false);
> {code}
> |data[2017-07-01].startTime|data[2017-07-01].endTime|
> +-+--+
> |1498854000|1498870800  |
> This is the result after collecting the reuslts for - 
> {code:java}
> MyClass userData = results.collectAsList().get(0);
> MyDTO userDTO = userData.getData().get("2017-07-01");
> System.out.println("userDTO startTime: " + userDTO.getStartTime());
> System.out.println("userDTO endTime: " + userDTO.getEndTime());
> {code}
> --
> data startTime: 1498870800
> data endTime: 1498854000
> I tend to believe it is a spark issue. Would love any suggestions on how to 
> bypass it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21402) Java encoders - switch fields on collectAsList

2017-07-25 Thread Tom (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom updated SPARK-21402:

Priority: Major  (was: Minor)

> Java encoders - switch fields on collectAsList
> --
>
> Key: SPARK-21402
> URL: https://issues.apache.org/jira/browse/SPARK-21402
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1
> Environment: mac os
> spark 2.1.1
> Using Scala version 2.11.8, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_121
>Reporter: Tom
>
> I have the following schema in a dataset -
> root
>  |-- userId: string (nullable = true)
>  |-- data: map (nullable = true)
>  ||-- key: string
>  ||-- value: struct (valueContainsNull = true)
>  |||-- startTime: long (nullable = true)
>  |||-- endTime: long (nullable = true)
>  |-- offset: long (nullable = true)
>  And I have the following classes (+ setter and getters which I omitted for 
> simplicity) -
>  
> {code:java}
> public class MyClass {
> private String userId;
> private Map data;
> private Long offset;
>  }
> public class MyDTO {
> private long startTime;
> private long endTime;
> }
> {code}
> I collect the result the following way - 
> {code:java}
> Encoder myClassEncoder = Encoders.bean(MyClass.class);
> Dataset results = raw_df.as(myClassEncoder);
> List lst = results.collectAsList();
> {code}
> 
> I do several calculations to get the result I want and the result is correct 
> all through the way before I collect it.
> This is the result for - 
> {code:java}
> results.select(results.col("data").getField("2017-07-01").getField("startTime")).show(false);
> {code}
> |data[2017-07-01].startTime|data[2017-07-01].endTime|
> +-+--+
> |1498854000|1498870800  |
> This is the result after collecting the reuslts for - 
> {code:java}
> MyClass userData = results.collectAsList().get(0);
> MyDTO userDTO = userData.getData().get("2017-07-01");
> System.out.println("userDTO startTime: " + userDTO.getStartTime());
> System.out.println("userDTO endTime: " + userDTO.getEndTime());
> {code}
> --
> data startTime: 1498870800
> data endTime: 1498854000
> I tend to believe it is a spark issue. Would love any suggestions on how to 
> bypass it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21402) Java encoders - switch fields on collectAsList

2017-07-21 Thread Tom (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16096162#comment-16096162
 ] 

Tom commented on SPARK-21402:
-

I have a small insight regarding this possible issue as I bumped into something 
the appeared similar to me.
If the class we are creating instances for has a method getX when X is not a 
member of this class it seems that something get messed up.
E.g. - I had another class with data member which is Long and static data 
member of type Double, I have method of the following - 

{code:java}
public double getX() {
return longDataMember / staticDouble;
}

{code}

This alone was enough to mess my class and to assign the value of 103079215144 
to all the longDataMember.

> Java encoders - switch fields on collectAsList
> --
>
> Key: SPARK-21402
> URL: https://issues.apache.org/jira/browse/SPARK-21402
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1
> Environment: mac os
> spark 2.1.1
> Using Scala version 2.11.8, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_121
>Reporter: Tom
>Priority: Minor
>
> I have the following schema in a dataset -
> root
>  |-- userId: string (nullable = true)
>  |-- data: map (nullable = true)
>  ||-- key: string
>  ||-- value: struct (valueContainsNull = true)
>  |||-- startTime: long (nullable = true)
>  |||-- endTime: long (nullable = true)
>  |-- offset: long (nullable = true)
>  And I have the following classes (+ setter and getters which I omitted for 
> simplicity) -
>  
> {code:java}
> public class MyClass {
> private String userId;
> private Map data;
> private Long offset;
>  }
> public class MyDTO {
> private long startTime;
> private long endTime;
> }
> {code}
> I collect the result the following way - 
> {code:java}
> Encoder myClassEncoder = Encoders.bean(MyClass.class);
> Dataset results = raw_df.as(myClassEncoder);
> List lst = results.collectAsList();
> {code}
> 
> I do several calculations to get the result I want and the result is correct 
> all through the way before I collect it.
> This is the result for - 
> {code:java}
> results.select(results.col("data").getField("2017-07-01").getField("startTime")).show(false);
> {code}
> |data[2017-07-01].startTime|data[2017-07-01].endTime|
> +-+--+
> |1498854000|1498870800  |
> This is the result after collecting the reuslts for - 
> {code:java}
> MyClass userData = results.collectAsList().get(0);
> MyDTO userDTO = userData.getData().get("2017-07-01");
> System.out.println("userDTO startTime: " + userDTO.getStartTime());
> System.out.println("userDTO endTime: " + userDTO.getEndTime());
> {code}
> --
> data startTime: 1498870800
> data endTime: 1498854000
> I tend to believe it is a spark issue. Would love any suggestions on how to 
> bypass it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21402) Java encoders - switch fields on collectAsList

2017-07-13 Thread Tom (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16086179#comment-16086179
 ] 

Tom commented on SPARK-21402:
-

Sorry, I was trying to hide my original code so it will be more understandable 
(changed it now). It reads the class it is suppose to read but when collecting 
it the fields are mixed. 

> Java encoders - switch fields on collectAsList
> --
>
> Key: SPARK-21402
> URL: https://issues.apache.org/jira/browse/SPARK-21402
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1
> Environment: mac os
> spark 2.1.1
> Using Scala version 2.11.8, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_121
>Reporter: Tom
>Priority: Minor
>
> I have the following schema in a dataset -
> root
>  |-- userId: string (nullable = true)
>  |-- data: map (nullable = true)
>  ||-- key: string
>  ||-- value: struct (valueContainsNull = true)
>  |||-- startTime: long (nullable = true)
>  |||-- endTime: long (nullable = true)
>  |-- offset: long (nullable = true)
>  And I have the following classes (+ setter and getters which I omitted for 
> simplicity) -
>  
> {code:java}
> public class MyClass {
> private String userId;
> private Map data;
> private Long offset;
>  }
> public class MyDTO {
> private long startTime;
> private long endTime;
> }
> {code}
> I collect the result the following way - 
> {code:java}
> Encoder myClassEncoder = Encoders.bean(MyClass.class);
> Dataset results = raw_df.as(myClassEncoder);
> List lst = results.collectAsList();
> {code}
> 
> I do several calculations to get the result I want and the result is correct 
> all through the way before I collect it.
> This is the result for - 
> {code:java}
> results.select(results.col("data").getField("2017-07-01").getField("startTime")).show(false);
> {code}
> |data[2017-07-01].startTime|data[2017-07-01].endTime|
> +-+--+
> |1498854000|1498870800  |
> This is the result after collecting the reuslts for - 
> {code:java}
> MyClass userData = results.collectAsList().get(0);
> MyDTO userDTO = userData.getData().get("2017-07-01");
> System.out.println("userDTO startTime: " + userDTO.getStartTime());
> System.out.println("userDTO endTime: " + userDTO.getEndTime());
> {code}
> --
> data startTime: 1498870800
> data endTime: 1498854000
> I tend to believe it is a spark issue. Would love any suggestions on how to 
> bypass it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21402) Java encoders - switch fields on collectAsList

2017-07-13 Thread Tom (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom updated SPARK-21402:

Description: 
I have the following schema in a dataset -

root
 |-- userId: string (nullable = true)
 |-- data: map (nullable = true)
 ||-- key: string
 ||-- value: struct (valueContainsNull = true)
 |||-- startTime: long (nullable = true)
 |||-- endTime: long (nullable = true)
 |-- offset: long (nullable = true)


 And I have the following classes (+ setter and getters which I omitted for 
simplicity) -


 
{code:java}
public class MyClass {

private String userId;

private Map data;

private Long offset;
 }

public class MyDTO {

private long startTime;
private long endTime;

}
{code}


I collect the result the following way - 


{code:java}
Encoder myClassEncoder = Encoders.bean(MyClass.class);
Dataset results = raw_df.as(myClassEncoder);
List lst = results.collectAsList();

{code}

I do several calculations to get the result I want and the result is correct 
all through the way before I collect it.
This is the result for - 


{code:java}
results.select(results.col("data").getField("2017-07-01").getField("startTime")).show(false);

{code}

|data[2017-07-01].startTime|data[2017-07-01].endTime|
+-+--+
|1498854000|1498870800  |


This is the result after collecting the reuslts for - 


{code:java}
MyClass userData = results.collectAsList().get(0);
MyDTO userDTO = userData.getData().get("2017-07-01");
System.out.println("userDTO startTime: " + userDTO.getStartTime());
System.out.println("userDTO endTime: " + userDTO.getEndTime());

{code}

--
data startTime: 1498870800
data endTime: 1498854000

I tend to believe it is a spark issue. Would love any suggestions on how to 
bypass it.

  was:
I have the following schema in a dataset -

root
 |-- userId: string (nullable = true)
 |-- data: map (nullable = true)
 ||-- key: string
 ||-- value: struct (valueContainsNull = true)
 |||-- startTime: long (nullable = true)
 |||-- endTime: long (nullable = true)
 |-- offset: long (nullable = true)


 And I have the following classes (+ setter and getters which I omitted for 
simplicity) -


 
{code:java}
public class MyClass {

private String userId;

private Map data;

private Long offset;
 }

public class MyDTO {

private long startTime;
private long endTime;

}
{code}


I collect the result the following way - 


{code:java}
Encoder myClassEncoder = Encoders.bean(MyClass.class);
Dataset results = raw_df.as(myClassEncoder);
List lst = results.collectAsList();

{code}

I do several calculations to get the result I want and the result is correct 
all through the way before I collect it.
This is the result for - 


{code:java}
results.select(results.col("data").getField("2017-07-01").getField("startTime")).show(false);

{code}

|data[2017-07-01].startTime|data[2017-07-01].endTime|
+-+--+
|1498854000|1498870800  |


This is the result after collecting the reuslts for - 


{code:java}
MyClass userData = results.collectAsList().get(0);
MyDTO userDTO = userData.getData().get("2017-07-01");
System.out.println("userDTO startTime: " + userDTO.getSleepStartTime());
System.out.println("userDTO endTime: " + userDTO.getSleepEndTime());

{code}

--
data startTime: 1498870800
data endTime: 1498854000

I tend to believe it is a spark issue. Would love any suggestions on how to 
bypass it.


> Java encoders - switch fields on collectAsList
> --
>
> Key: SPARK-21402
> URL: https://issues.apache.org/jira/browse/SPARK-21402
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1
> Environment: mac os
> spark 2.1.1
> Using Scala version 2.11.8, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_121
>Reporter: Tom
>Priority: Minor
>
> I have the following schema in a dataset -
> root
>  |-- userId: string (nullable = true)
>  |-- data: map (nullable = true)
>  ||-- key: string
>  ||-- value: struct (valueContainsNull = true)
>  |||-- startTime: long (nullable = true)
>  |||-- endTime: long (nullable = true)
>  |-- offset: long (nullable = true)
>  And I have the following classes (+ setter and getters which I omitted for 
> simplicity) -
>  
> {code:java}
> public class MyClass {
> private String userId;
> private Map data;
> private Long offset;
>  }
> public class MyDTO {
> private long startTime;
> private long endTime;
> }
> {code}
> I collect the result the following way - 
> {code:java}
> Encoder myClassEncoder = Encoders.bean(MyClass.class);
> Dataset results = r

[jira] [Updated] (SPARK-21402) Java encoders - switch fields on collectAsList

2017-07-13 Thread Tom (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom updated SPARK-21402:

Description: 
I have the following schema in a dataset -

root
 |-- userId: string (nullable = true)
 |-- data: map (nullable = true)
 ||-- key: string
 ||-- value: struct (valueContainsNull = true)
 |||-- startTime: long (nullable = true)
 |||-- endTime: long (nullable = true)
 |-- offset: long (nullable = true)


 And I have the following classes (+ setter and getters which I omitted for 
simplicity) -


 
{code:java}
public class MyClass {

private String userId;

private Map data;

private Long offset;
 }

public class MyDTO {

private long startTime;
private long endTime;

}
{code}


I collect the result the following way - 


{code:java}
Encoder myClassEncoder = Encoders.bean(MyClass.class);
Dataset results = raw_df.as(myClassEncoder);
List lst = results.collectAsList();

{code}

I do several calculations to get the result I want and the result is correct 
all through the way before I collect it.
This is the result for - 


{code:java}
results.select(results.col("data").getField("2017-07-01").getField("startTime")).show(false);

{code}

|data[2017-07-01].startTime|data[2017-07-01].endTime|
+-+--+
|1498854000|1498870800  |


This is the result after collecting the reuslts for - 


{code:java}
MyClass userData = results.collectAsList().get(0);
MyDTO userDTO = userData.getData().get("2017-07-01");
System.out.println("userDTO startTime: " + userDTO.getSleepStartTime());
System.out.println("userDTO endTime: " + userDTO.getSleepEndTime());

{code}

--
data startTime: 1498870800
data endTime: 1498854000

I tend to believe it is a spark issue. Would love any suggestions on how to 
bypass it.

  was:

I have the following schema in a dataset -

root
 |-- userId: string (nullable = true)
 |-- data: map (nullable = true)
 ||-- key: string
 ||-- value: struct (valueContainsNull = true)
 |||-- startTime: long (nullable = true)
 |||-- endTime: long (nullable = true)
 |-- offset: long (nullable = true)


 And I have the following classes (+ setter and getters which I omitted for 
simplicity) -


 public class MyClass {

private String userId;

private Map data;

private Long offset;
 }

public class MyDTO {

private long startTime;
private long endTime;

}

I collect the result the following way - 

Encoder myClassEncoder = Encoders.bean(MyClass.class);
Dataset results = raw_df.as(myClassEncoder);
List lst = results.collectAsList();

I do several calculations to get the result I want and the result is correct 
all through the way before I collect it.
This is the result for - 

results.select(results.col("data").getField("2017-07-01").getField("startTime")).show(false);

|data[2017-07-01].startTime|data[2017-07-01].endTime|
++--+
|1498854000|1498870800  |


This is the result after collecting the reuslts for - 

MyClass userData = results.collectAsList().get(0);
MyDTO userDTO = userData.getData().get("2017-07-01");
System.out.println("userDTO startTime: " + userDTO.getSleepStartTime());
System.out.println("userDTO endTime: " + userDTO.getSleepEndTime());

--
data startTime: 1498870800
data endTime: 1498854000

I tend to believe it is a spark issue. Would love any suggestions on how to 
bypass it.


> Java encoders - switch fields on collectAsList
> --
>
> Key: SPARK-21402
> URL: https://issues.apache.org/jira/browse/SPARK-21402
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1
> Environment: mac os
> spark 2.1.1
> Using Scala version 2.11.8, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_121
>Reporter: Tom
>Priority: Minor
>
> I have the following schema in a dataset -
> root
>  |-- userId: string (nullable = true)
>  |-- data: map (nullable = true)
>  ||-- key: string
>  ||-- value: struct (valueContainsNull = true)
>  |||-- startTime: long (nullable = true)
>  |||-- endTime: long (nullable = true)
>  |-- offset: long (nullable = true)
>  And I have the following classes (+ setter and getters which I omitted for 
> simplicity) -
>  
> {code:java}
> public class MyClass {
> private String userId;
> private Map data;
> private Long offset;
>  }
> public class MyDTO {
> private long startTime;
> private long endTime;
> }
> {code}
> I collect the result the following way - 
> {code:java}
> Encoder myClassEncoder = Encoders.bean(MyClass.class);
> Dataset results = raw_df.as(myClassEncoder);
> List lst = results.collectAsLi

[jira] [Created] (SPARK-21402) Java encoders - switch fields on collectAsList

2017-07-13 Thread Tom (JIRA)
Tom created SPARK-21402:
---

 Summary: Java encoders - switch fields on collectAsList
 Key: SPARK-21402
 URL: https://issues.apache.org/jira/browse/SPARK-21402
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.1.1
 Environment: mac os
spark 2.1.1
Using Scala version 2.11.8, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_121
Reporter: Tom
Priority: Minor



I have the following schema in a dataset -

root
 |-- userId: string (nullable = true)
 |-- data: map (nullable = true)
 ||-- key: string
 ||-- value: struct (valueContainsNull = true)
 |||-- startTime: long (nullable = true)
 |||-- endTime: long (nullable = true)
 |-- offset: long (nullable = true)


 And I have the following classes (+ setter and getters which I omitted for 
simplicity) -


 public class MyClass {

private String userId;

private Map data;

private Long offset;
 }

public class MyDTO {

private long startTime;
private long endTime;

}

I collect the result the following way - 

Encoder myClassEncoder = Encoders.bean(MyClass.class);
Dataset results = raw_df.as(myClassEncoder);
List lst = results.collectAsList();

I do several calculations to get the result I want and the result is correct 
all through the way before I collect it.
This is the result for - 

results.select(results.col("data").getField("2017-07-01").getField("startTime")).show(false);

|data[2017-07-01].startTime|data[2017-07-01].endTime|
++--+
|1498854000|1498870800  |


This is the result after collecting the reuslts for - 

MyClass userData = results.collectAsList().get(0);
MyDTO userDTO = userData.getData().get("2017-07-01");
System.out.println("userDTO startTime: " + userDTO.getSleepStartTime());
System.out.println("userDTO endTime: " + userDTO.getSleepEndTime());

--
data startTime: 1498870800
data endTime: 1498854000

I tend to believe it is a spark issue. Would love any suggestions on how to 
bypass it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org