[jira] [Updated] (SPARK-26879) Inconsistency in default column names for functions like inline and stack

2019-02-14 Thread Jash Gala (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jash Gala updated SPARK-26879:
--
Description: 
In the Spark SQL functions definitions, `inline` uses col1, col2, etc. (i.e. 
1-indexed columns), while `stack` uses col0, col1, col2, etc. (i.e. 0-indexed 
columns).

Ex:
{code|borderStyle=solid}
scala> spark.sql("SELECT stack(2, 1, 2, 3)").show

| col0 | col1 |
|1 | 2|
|3 | null |

scala> spark.sql("SELECT inline_outer(array(struct(1, 'a'), struct(2, 
'b')))").show

| col1 | col2 |
|1 | a|
|2 | b|

{/code}

This feels like an issue with consistency. As discussed on [PR 
#23748|https://github.com/apache/spark/pull/23748], it might be a good idea to 
standardize this to something specific (like zero-based indexing) for these and 
other similar functions.



  was:
In the Spark SQL functions definitions, `inline` uses col1, col2, etc. (i.e. 
1-indexed columns), while `stack` uses col0, col1, col2, etc. (i.e. 0-indexed 
columns).

Ex:
```
scala> spark.sql("SELECT stack(2, 1, 2, 3)").show

| col0 | col1 |
|1 | 2|
|3 | null |

scala> spark.sql("SELECT inline_outer(array(struct(1, 'a'), struct(2, 
'b')))").show

| col1 | col2 |
|1 | a|
|2 | b|
```
This feels like an issue with consistency. As discussed on [PR 
#23748|https://github.com/apache/spark/pull/23748], it might be a good idea to 
standardize this to something specific (like zero-based indexing) for these and 
other similar functions.


> Inconsistency in default column names for functions like inline and stack
> -
>
> Key: SPARK-26879
> URL: https://issues.apache.org/jira/browse/SPARK-26879
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Jash Gala
>Priority: Minor
>
> In the Spark SQL functions definitions, `inline` uses col1, col2, etc. (i.e. 
> 1-indexed columns), while `stack` uses col0, col1, col2, etc. (i.e. 0-indexed 
> columns).
> Ex:
> {code|borderStyle=solid}
> scala> spark.sql("SELECT stack(2, 1, 2, 3)").show
> | col0 | col1 |
> |1 | 2|
> |3 | null |
> scala> spark.sql("SELECT inline_outer(array(struct(1, 'a'), struct(2, 
> 'b')))").show
> | col1 | col2 |
> |1 | a|
> |2 | b|
> {/code}
> This feels like an issue with consistency. As discussed on [PR 
> #23748|https://github.com/apache/spark/pull/23748], it might be a good idea 
> to standardize this to something specific (like zero-based indexing) for 
> these and other similar functions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26879) Inconsistency in default column names for functions like inline and stack

2019-02-14 Thread Jash Gala (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jash Gala updated SPARK-26879:
--
Description: 
In the Spark SQL functions definitions, `inline` uses col1, col2, etc. (i.e. 
1-indexed columns), while `stack` uses col0, col1, col2, etc. (i.e. 0-indexed 
columns).

{code:title=spark-shell|borderStyle=solid}
scala> spark.sql("SELECT stack(2, 1, 2, 3)").show
+++
|col0|col1|
+++
|   1|   2|
|   3|null|
+++


scala>  spark.sql("SELECT inline_outer(array(struct(1, 'a'), struct(2, 
'b')))").show
+++
|col1|col2|
+++
|   1|   a|
|   2|   b|
+++
{code}

This feels like an issue with consistency. As discussed on [PR 
#23748|https://github.com/apache/spark/pull/23748], it might be a good idea to 
standardize this to something specific (like zero-based indexing) for these and 
other similar functions.



  was:
In the Spark SQL functions definitions, `inline` uses col1, col2, etc. (i.e. 
1-indexed columns), while `stack` uses col0, col1, col2, etc. (i.e. 0-indexed 
columns).

{code:title=spark-shell|borderStyle=solid}
scala> spark.sql("SELECT stack(2, 1, 2, 3)").show

| col0 | col1 |
|1 | 2|
|3 | null |

scala> spark.sql("SELECT inline_outer(array(struct(1, 'a'), struct(2, 
'b')))").show

| col1 | col2 |
|1 | a|
|2 | b|

{code}

This feels like an issue with consistency. As discussed on [PR 
#23748|https://github.com/apache/spark/pull/23748], it might be a good idea to 
standardize this to something specific (like zero-based indexing) for these and 
other similar functions.




> Inconsistency in default column names for functions like inline and stack
> -
>
> Key: SPARK-26879
> URL: https://issues.apache.org/jira/browse/SPARK-26879
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Jash Gala
>Priority: Minor
>
> In the Spark SQL functions definitions, `inline` uses col1, col2, etc. (i.e. 
> 1-indexed columns), while `stack` uses col0, col1, col2, etc. (i.e. 0-indexed 
> columns).
> {code:title=spark-shell|borderStyle=solid}
> scala> spark.sql("SELECT stack(2, 1, 2, 3)").show
> +++
> |col0|col1|
> +++
> |   1|   2|
> |   3|null|
> +++
> scala>  spark.sql("SELECT inline_outer(array(struct(1, 'a'), struct(2, 
> 'b')))").show
> +++
> |col1|col2|
> +++
> |   1|   a|
> |   2|   b|
> +++
> {code}
> This feels like an issue with consistency. As discussed on [PR 
> #23748|https://github.com/apache/spark/pull/23748], it might be a good idea 
> to standardize this to something specific (like zero-based indexing) for 
> these and other similar functions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26879) Inconsistency in default column names for functions like inline and stack

2019-02-14 Thread Jash Gala (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jash Gala updated SPARK-26879:
--
Description: 
In the Spark SQL functions definitions, `inline` uses col1, col2, etc. (i.e. 
1-indexed columns), while `stack` uses col0, col1, col2, etc. (i.e. 0-indexed 
columns).

Ex:
```
scala> spark.sql("SELECT stack(2, 1, 2, 3)").show

| col0 | col1 |
|1 | 2|
|3 | null |

scala> spark.sql("SELECT inline_outer(array(struct(1, 'a'), struct(2, 
'b')))").show

| col1 | col2 |
|1 | a|
|2 | b|
```
This feels like an issue with consistency. As discussed on [PR 
#23748|https://github.com/apache/spark/pull/23748], it might be a good idea to 
standardize this to something specific (like zero-based indexing).

  was:
While looking at the default column names used by inline and stack, I found 
that inline uses col1, col2, etc. (i.e. 1-indexed columns), while stack uses 
col0, col1, col2, etc. (i.e. 0-indexed columns).

Ex:

scala> spark.sql("SELECT stack(2, 1, 2, 3)").show

| col0 | col1 |
|1 | 2|
|3 | null |

scala> spark.sql("SELECT inline_outer(array(struct(1, 'a'), struct(2, 
'b')))").show

| col1 | col2 |
|1 | a|
|2 | b|

This feels like an issue with consistency. As discussed on [PR 
#23748|https://github.com/apache/spark/pull/23748], it might be a good idea to 
standardize this to something specific (like zero-based indexing).


> Inconsistency in default column names for functions like inline and stack
> -
>
> Key: SPARK-26879
> URL: https://issues.apache.org/jira/browse/SPARK-26879
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Jash Gala
>Priority: Minor
>
> In the Spark SQL functions definitions, `inline` uses col1, col2, etc. (i.e. 
> 1-indexed columns), while `stack` uses col0, col1, col2, etc. (i.e. 0-indexed 
> columns).
> Ex:
> ```
> scala> spark.sql("SELECT stack(2, 1, 2, 3)").show
> | col0 | col1 |
> |1 | 2|
> |3 | null |
> scala> spark.sql("SELECT inline_outer(array(struct(1, 'a'), struct(2, 
> 'b')))").show
> | col1 | col2 |
> |1 | a|
> |2 | b|
> ```
> This feels like an issue with consistency. As discussed on [PR 
> #23748|https://github.com/apache/spark/pull/23748], it might be a good idea 
> to standardize this to something specific (like zero-based indexing).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26879) Inconsistency in default column names for functions like inline and stack

2019-02-14 Thread Jash Gala (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jash Gala updated SPARK-26879:
--
Description: 
In the Spark SQL functions definitions, `inline` uses col1, col2, etc. (i.e. 
1-indexed columns), while `stack` uses col0, col1, col2, etc. (i.e. 0-indexed 
columns).

{code:title=spark-shell|borderStyle=solid}
scala> spark.sql("SELECT stack(2, 1, 2, 3)").show

| col0 | col1 |
|1 | 2|
|3 | null |

scala> spark.sql("SELECT inline_outer(array(struct(1, 'a'), struct(2, 
'b')))").show

| col1 | col2 |
|1 | a|
|2 | b|

{code}

This feels like an issue with consistency. As discussed on [PR 
#23748|https://github.com/apache/spark/pull/23748], it might be a good idea to 
standardize this to something specific (like zero-based indexing) for these and 
other similar functions.



  was:
In the Spark SQL functions definitions, `inline` uses col1, col2, etc. (i.e. 
1-indexed columns), while `stack` uses col0, col1, col2, etc. (i.e. 0-indexed 
columns).

Ex:
{code|borderStyle=solid}
scala> spark.sql("SELECT stack(2, 1, 2, 3)").show

| col0 | col1 |
|1 | 2|
|3 | null |

scala> spark.sql("SELECT inline_outer(array(struct(1, 'a'), struct(2, 
'b')))").show

| col1 | col2 |
|1 | a|
|2 | b|

{/code}

This feels like an issue with consistency. As discussed on [PR 
#23748|https://github.com/apache/spark/pull/23748], it might be a good idea to 
standardize this to something specific (like zero-based indexing) for these and 
other similar functions.




> Inconsistency in default column names for functions like inline and stack
> -
>
> Key: SPARK-26879
> URL: https://issues.apache.org/jira/browse/SPARK-26879
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Jash Gala
>Priority: Minor
>
> In the Spark SQL functions definitions, `inline` uses col1, col2, etc. (i.e. 
> 1-indexed columns), while `stack` uses col0, col1, col2, etc. (i.e. 0-indexed 
> columns).
> {code:title=spark-shell|borderStyle=solid}
> scala> spark.sql("SELECT stack(2, 1, 2, 3)").show
> | col0 | col1 |
> |1 | 2|
> |3 | null |
> scala> spark.sql("SELECT inline_outer(array(struct(1, 'a'), struct(2, 
> 'b')))").show
> | col1 | col2 |
> |1 | a|
> |2 | b|
> {code}
> This feels like an issue with consistency. As discussed on [PR 
> #23748|https://github.com/apache/spark/pull/23748], it might be a good idea 
> to standardize this to something specific (like zero-based indexing) for 
> these and other similar functions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26879) Inconsistency in default column names for functions like inline and stack

2019-02-14 Thread Jash Gala (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jash Gala updated SPARK-26879:
--
Description: 
In the Spark SQL functions definitions, `inline` uses col1, col2, etc. (i.e. 
1-indexed columns), while `stack` uses col0, col1, col2, etc. (i.e. 0-indexed 
columns).

Ex:
```
scala> spark.sql("SELECT stack(2, 1, 2, 3)").show

| col0 | col1 |
|1 | 2|
|3 | null |

scala> spark.sql("SELECT inline_outer(array(struct(1, 'a'), struct(2, 
'b')))").show

| col1 | col2 |
|1 | a|
|2 | b|
```
This feels like an issue with consistency. As discussed on [PR 
#23748|https://github.com/apache/spark/pull/23748], it might be a good idea to 
standardize this to something specific (like zero-based indexing) for these and 
other similar functions.

  was:
In the Spark SQL functions definitions, `inline` uses col1, col2, etc. (i.e. 
1-indexed columns), while `stack` uses col0, col1, col2, etc. (i.e. 0-indexed 
columns).

Ex:
```
scala> spark.sql("SELECT stack(2, 1, 2, 3)").show

| col0 | col1 |
|1 | 2|
|3 | null |

scala> spark.sql("SELECT inline_outer(array(struct(1, 'a'), struct(2, 
'b')))").show

| col1 | col2 |
|1 | a|
|2 | b|
```
This feels like an issue with consistency. As discussed on [PR 
#23748|https://github.com/apache/spark/pull/23748], it might be a good idea to 
standardize this to something specific (like zero-based indexing).


> Inconsistency in default column names for functions like inline and stack
> -
>
> Key: SPARK-26879
> URL: https://issues.apache.org/jira/browse/SPARK-26879
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Jash Gala
>Priority: Minor
>
> In the Spark SQL functions definitions, `inline` uses col1, col2, etc. (i.e. 
> 1-indexed columns), while `stack` uses col0, col1, col2, etc. (i.e. 0-indexed 
> columns).
> Ex:
> ```
> scala> spark.sql("SELECT stack(2, 1, 2, 3)").show
> | col0 | col1 |
> |1 | 2|
> |3 | null |
> scala> spark.sql("SELECT inline_outer(array(struct(1, 'a'), struct(2, 
> 'b')))").show
> | col1 | col2 |
> |1 | a|
> |2 | b|
> ```
> This feels like an issue with consistency. As discussed on [PR 
> #23748|https://github.com/apache/spark/pull/23748], it might be a good idea 
> to standardize this to something specific (like zero-based indexing) for 
> these and other similar functions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26879) Inconsistency in default column names for functions like inline and stack

2019-02-14 Thread Jash Gala (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jash Gala updated SPARK-26879:
--
Description: 
While looking at the default column names used by inline and stack, I found 
that inline uses col1, col2, etc. (i.e. 1-indexed columns), while stack uses 
col0, col1, col2, etc. (i.e. 0-indexed columns).

Ex:

scala> spark.sql("SELECT stack(2, 1, 2, 3)").show

| col0 | col1 |
|1 | 2|
|3 | null |

scala> spark.sql("SELECT inline_outer(array(struct(1, 'a'), struct(2, 
'b')))").show

| col1 | col2 |
|1 | a|
|2 | b|

This feels like an issue with consistency. As discussed on [PR 
#23748|https://github.com/apache/spark/pull/23748], it might be a good idea to 
standardize this to something specific (like zero-based indexing).

  was:
While looking at the default column names used by inline and stack, I found 
that inline uses col1, col2, etc. (i.e. 1-indexed columns), while stack uses 
col0, col1, col2, etc. (i.e. 0-indexed columns).

Ex:

scala> spark.sql("SELECT stack(2, 1, 2, 3)").show

|--+--|
| col0 | col1 |
|--+--|
|1 | 2|
|3 | null |
|--+--|

scala> spark.sql("SELECT inline_outer(array(struct(1, 'a'), struct(2, 
'b')))").show

|--+--|
| col1 | col2 |
|--+--|
|1 | a|
|2 | b|
|--+--|

This feels like an issue with consistency. As discussed on [PR 
#23748|https://github.com/apache/spark/pull/23748], it might be a good idea to 
standardize this to something specific (like zero-based indexing).


> Inconsistency in default column names for functions like inline and stack
> -
>
> Key: SPARK-26879
> URL: https://issues.apache.org/jira/browse/SPARK-26879
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Jash Gala
>Priority: Minor
>
> While looking at the default column names used by inline and stack, I found 
> that inline uses col1, col2, etc. (i.e. 1-indexed columns), while stack uses 
> col0, col1, col2, etc. (i.e. 0-indexed columns).
> Ex:
> scala> spark.sql("SELECT stack(2, 1, 2, 3)").show
> | col0 | col1 |
> |1 | 2|
> |3 | null |
> scala> spark.sql("SELECT inline_outer(array(struct(1, 'a'), struct(2, 
> 'b')))").show
> | col1 | col2 |
> |1 | a|
> |2 | b|
> This feels like an issue with consistency. As discussed on [PR 
> #23748|https://github.com/apache/spark/pull/23748], it might be a good idea 
> to standardize this to something specific (like zero-based indexing).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26879) Inconsistency in default column names for functions like inline and stack

2019-02-14 Thread Jash Gala (JIRA)
Jash Gala created SPARK-26879:
-

 Summary: Inconsistency in default column names for functions like 
inline and stack
 Key: SPARK-26879
 URL: https://issues.apache.org/jira/browse/SPARK-26879
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: Jash Gala


While looking at the default column names used by inline and stack, I found 
that inline uses col1, col2, etc. (i.e. 1-indexed columns), while stack uses 
col0, col1, col2, etc. (i.e. 0-indexed columns).

{{
scala> spark.sql("SELECT stack(2, 1, 2, 3)").show

|--+--|
| col0 | col1 |
|--+--|
|1 | 2|
|3 | null |
|--+--|

scala> spark.sql("SELECT inline_outer(array(struct(1, 'a'), struct(2, 
'b')))").show

|--+--|
| col1 | col2 |
|--+--|
|1 | a|
|2 | b|
|--+--|
}}

This feels like an issue with consistency. As discussed on [PR 
#23748|https://github.com/apache/spark/pull/23748], it might be a good idea to 
standardize this to something specific (like zero-based indexing).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26879) Inconsistency in default column names for functions like inline and stack

2019-02-14 Thread Jash Gala (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jash Gala updated SPARK-26879:
--
Description: 
While looking at the default column names used by inline and stack, I found 
that inline uses col1, col2, etc. (i.e. 1-indexed columns), while stack uses 
col0, col1, col2, etc. (i.e. 0-indexed columns).

Ex:

scala> spark.sql("SELECT stack(2, 1, 2, 3)").show

|--+--|
| col0 | col1 |
|--+--|
|1 | 2|
|3 | null |
|--+--|

scala> spark.sql("SELECT inline_outer(array(struct(1, 'a'), struct(2, 
'b')))").show

|--+--|
| col1 | col2 |
|--+--|
|1 | a|
|2 | b|
|--+--|

This feels like an issue with consistency. As discussed on [PR 
#23748|https://github.com/apache/spark/pull/23748], it might be a good idea to 
standardize this to something specific (like zero-based indexing).

  was:
While looking at the default column names used by inline and stack, I found 
that inline uses col1, col2, etc. (i.e. 1-indexed columns), while stack uses 
col0, col1, col2, etc. (i.e. 0-indexed columns).

{{
scala> spark.sql("SELECT stack(2, 1, 2, 3)").show

|--+--|
| col0 | col1 |
|--+--|
|1 | 2|
|3 | null |
|--+--|

scala> spark.sql("SELECT inline_outer(array(struct(1, 'a'), struct(2, 
'b')))").show

|--+--|
| col1 | col2 |
|--+--|
|1 | a|
|2 | b|
|--+--|
}}

This feels like an issue with consistency. As discussed on [PR 
#23748|https://github.com/apache/spark/pull/23748], it might be a good idea to 
standardize this to something specific (like zero-based indexing).


> Inconsistency in default column names for functions like inline and stack
> -
>
> Key: SPARK-26879
> URL: https://issues.apache.org/jira/browse/SPARK-26879
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Jash Gala
>Priority: Minor
>
> While looking at the default column names used by inline and stack, I found 
> that inline uses col1, col2, etc. (i.e. 1-indexed columns), while stack uses 
> col0, col1, col2, etc. (i.e. 0-indexed columns).
> Ex:
> scala> spark.sql("SELECT stack(2, 1, 2, 3)").show
> |--+--|
> | col0 | col1 |
> |--+--|
> |1 | 2|
> |3 | null |
> |--+--|
> scala> spark.sql("SELECT inline_outer(array(struct(1, 'a'), struct(2, 
> 'b')))").show
> |--+--|
> | col1 | col2 |
> |--+--|
> |1 | a|
> |2 | b|
> |--+--|
> This feels like an issue with consistency. As discussed on [PR 
> #23748|https://github.com/apache/spark/pull/23748], it might be a good idea 
> to standardize this to something specific (like zero-based indexing).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-23619) Document the column names created by explode and posexplode functions

2019-02-08 Thread Jash Gala (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763564#comment-16763564
 ] 

Jash Gala edited comment on SPARK-23619 at 2/8/19 2:14 PM:
---

I've fixed this and raised a PR: https://github.com/apache/spark/pull/23748


was (Author: jashgala):
I'll fix this and raise a PR

> Document the column names created by explode and posexplode functions
> -
>
> Key: SPARK-23619
> URL: https://issues.apache.org/jira/browse/SPARK-23619
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 2.3.0
>Reporter: Joe Pallas
>Priority: Minor
>  Labels: documentation
>
> The documentation for {{explode}} and {{posexplode}} neglects to mention the 
> default column names for the new columns: {{col}} and {{pos}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23619) Document the column names created by explode and posexplode functions

2019-02-08 Thread Jash Gala (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763564#comment-16763564
 ] 

Jash Gala commented on SPARK-23619:
---

I'll fix this and raise a PR

> Document the column names created by explode and posexplode functions
> -
>
> Key: SPARK-23619
> URL: https://issues.apache.org/jira/browse/SPARK-23619
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 2.3.0
>Reporter: Joe Pallas
>Priority: Minor
>  Labels: documentation
>
> The documentation for {{explode}} and {{posexplode}} neglects to mention the 
> default column names for the new columns: {{col}} and {{pos}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10892) Join with Data Frame returns wrong results

2019-02-07 Thread Jash Gala (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763001#comment-16763001
 ] 

Jash Gala commented on SPARK-10892:
---

 This issue is still reproducible in Spark 2.4.0.

> Join with Data Frame returns wrong results
> --
>
> Key: SPARK-10892
> URL: https://issues.apache.org/jira/browse/SPARK-10892
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.1, 1.5.0
>Reporter: Ofer Mendelevitch
>Priority: Critical
> Attachments: data.json
>
>
> I'm attaching a simplified reproducible example of the problem:
> 1. Loading a JSON file from HDFS as a Data Frame
> 2. Creating 3 data frames: PRCP, TMIN, TMAX
> 3. Joining the data frames together. Each of those has a column "value" with 
> the same name, so renaming them after the join.
> 4. The output seems incorrect; the first column has the correct values, but 
> the two other columns seem to have a copy of the values from the first column.
> Here's the sample code:
> {code}
> import org.apache.spark.sql._
> val sqlc = new SQLContext(sc)
> val weather = sqlc.read.format("json").load("data.json")
> val prcp = weather.filter("metric = 'PRCP'").as("prcp").cache()
> val tmin = weather.filter("metric = 'TMIN'").as("tmin").cache()
> val tmax = weather.filter("metric = 'TMAX'").as("tmax").cache()
> prcp.filter("year=2012 and month=10").show()
> tmin.filter("year=2012 and month=10").show()
> tmax.filter("year=2012 and month=10").show()
> val out = (prcp.join(tmin, "date_str").join(tmax, "date_str")
>   .select(prcp("year"), prcp("month"), prcp("day"), prcp("date_str"),
> prcp("value").alias("PRCP"), tmin("value").alias("TMIN"),
> tmax("value").alias("TMAX")) )
> out.filter("year=2012 and month=10").show()
> {code}
> The output is:
> {code}
> ++---+--+-+---+-++
> |date_str|day|metric|month|station|value|year|
> ++---+--+-+---+-++
> |20121001|  1|  PRCP|   10|USW00023272|0|2012|
> |20121002|  2|  PRCP|   10|USW00023272|0|2012|
> |20121003|  3|  PRCP|   10|USW00023272|0|2012|
> |20121004|  4|  PRCP|   10|USW00023272|0|2012|
> |20121005|  5|  PRCP|   10|USW00023272|0|2012|
> |20121006|  6|  PRCP|   10|USW00023272|0|2012|
> |20121007|  7|  PRCP|   10|USW00023272|0|2012|
> |20121008|  8|  PRCP|   10|USW00023272|0|2012|
> |20121009|  9|  PRCP|   10|USW00023272|0|2012|
> |20121010| 10|  PRCP|   10|USW00023272|0|2012|
> |20121011| 11|  PRCP|   10|USW00023272|3|2012|
> |20121012| 12|  PRCP|   10|USW00023272|0|2012|
> |20121013| 13|  PRCP|   10|USW00023272|0|2012|
> |20121014| 14|  PRCP|   10|USW00023272|0|2012|
> |20121015| 15|  PRCP|   10|USW00023272|0|2012|
> |20121016| 16|  PRCP|   10|USW00023272|0|2012|
> |20121017| 17|  PRCP|   10|USW00023272|0|2012|
> |20121018| 18|  PRCP|   10|USW00023272|0|2012|
> |20121019| 19|  PRCP|   10|USW00023272|0|2012|
> |20121020| 20|  PRCP|   10|USW00023272|0|2012|
> ++---+--+-+---+-+——+
> ++---+--+-+---+-++
> |date_str|day|metric|month|station|value|year|
> ++---+--+-+---+-++
> |20121001|  1|  TMIN|   10|USW00023272|  139|2012|
> |20121002|  2|  TMIN|   10|USW00023272|  178|2012|
> |20121003|  3|  TMIN|   10|USW00023272|  144|2012|
> |20121004|  4|  TMIN|   10|USW00023272|  144|2012|
> |20121005|  5|  TMIN|   10|USW00023272|  139|2012|
> |20121006|  6|  TMIN|   10|USW00023272|  128|2012|
> |20121007|  7|  TMIN|   10|USW00023272|  122|2012|
> |20121008|  8|  TMIN|   10|USW00023272|  122|2012|
> |20121009|  9|  TMIN|   10|USW00023272|  139|2012|
> |20121010| 10|  TMIN|   10|USW00023272|  128|2012|
> |20121011| 11|  TMIN|   10|USW00023272|  122|2012|
> |20121012| 12|  TMIN|   10|USW00023272|  117|2012|
> |20121013| 13|  TMIN|   10|USW00023272|  122|2012|
> |20121014| 14|  TMIN|   10|USW00023272|  128|2012|
> |20121015| 15|  TMIN|   10|USW00023272|  128|2012|
> |20121016| 16|  TMIN|   10|USW00023272|  156|2012|
> |20121017| 17|  TMIN|   10|USW00023272|  139|2012|
> |20121018| 18|  TMIN|   10|USW00023272|  161|2012|
> |20121019| 19|  TMIN|   10|USW00023272|  133|2012|
> |20121020| 20|  TMIN|   10|USW00023272|  122|2012|
> ++---+--+-+---+-+——+
> ++---+--+-+---+-++
> |date_str|day|metric|month|station|value|year|
> ++---+--+-+---+-++
> |20121001|  1|  TMAX|   10|USW00023272|  322|2012|
> |20121002|  2|  TMAX|   10|USW00023272|  344|2012|
> |20121003|  3|  TMAX|   10|USW00023272|  222|2012|
> |20121004|  4|  TMAX|   10|USW00023272|  189|2012|
> |20121005|  5|  TMAX|   10|USW00023272|  194|2012|
> |20121006|  6|  TMAX|