from:"Marek Novotny \(Jira\)"

[jira] [Commented] (SPARK-35885) Use keyserver.ubuntu.com as a keyserver for CRAN

2021-11-22 Thread Marek Novotny (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447386#comment-17447386
 ] 

Marek Novotny commented on SPARK-35885:
---

[~dongjoon] FYI, buster-cran35 is signed by a different key (fingerprint 
'95C0FAF38DB3CCAD0C080A7BDC78B2DDEABC47B7') since 17th November (see 
[http://cloud.r-project.org/bin/linux/debian/]). According to the current 
repository state, this change might affect the branch 
[branch-3.0|https://github.com/apache/spark/blob/branch-3.0/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/R/Dockerfile#L32].

> Use keyserver.ubuntu.com as a keyserver for CRAN
> 
>
> Key: SPARK-35885
> URL: https://issues.apache.org/jira/browse/SPARK-35885
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, R
>Affects Versions: 3.0.3, 3.1.2, 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Blocker
> Fix For: 3.2.0, 3.1.3, 3.0.4
>
>
> This issue aims to use `keyserver.ubuntu.com` as a keyserver for CRAN.
> K8s SparkR docker image build fails because both servers don't work correctly.
> {code}
> $ docker run -it --rm openjdk:11 /bin/bash
> root@3e89a8d05378:/# echo "deb http://cloud.r-project.org/bin/linux/debian 
> buster-cran35/" >> /etc/apt/sources.list
> root@3e89a8d05378:/# (apt-key adv --keyserver keys.gnupg.net --recv-key 
> 'E19F5F87128899B192B1A2C2AD5F960A256A04AF' || apt-key adv --keyserver 
> keys.openpgp.org --recv-key 'E19F5F87128899B192B1A2C2AD5F960A256A04AF')
> Executing: /tmp/apt-key-gpghome.8lNIiUuhoE/gpg.1.sh --keyserver 
> keys.gnupg.net --recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF
> gpg: keyserver receive failed: No name
> Executing: /tmp/apt-key-gpghome.stxb8XUlx8/gpg.1.sh --keyserver 
> keys.openpgp.org --recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF
> gpg: key AD5F960A256A04AF: new key but contains no user ID - skipped
> gpg: Total number processed: 1
> gpg:   w/o user IDs: 1
> root@3e89a8d05378:/# apt-get update
> ...
> Err:3 http://cloud.r-project.org/bin/linux/debian buster-cran35/ InRelease
>   The following signatures couldn't be verified because the public key is not 
> available: NO_PUBKEY FCAE2A0E115C3D8A
> ...
> W: GPG error: http://cloud.r-project.org/bin/linux/debian buster-cran35/ 
> InRelease: The following signatures couldn't be verified because the public 
> key is not available: NO_PUBKEY FCAE2A0E115C3D8A
> E: The repository 'http://cloud.r-project.org/bin/linux/debian buster-cran35/ 
> InRelease' is not signed.
> N: Updating from such a repository can't be done securely, and is therefore 
> disabled by default.
> N: See apt-secure(8) manpage for repository creation and user configuration 
> details.
> {code}
> `keyserver.ubuntu.com` is a recommended backup server in CRAN document.
> - http://cloud.r-project.org/bin/linux/debian/
> {code}
> $ docker run -it --rm openjdk:11 /bin/bash
> root@c9b183e45ffe:/# echo "deb http://cloud.r-project.org/bin/linux/debian 
> buster-cran35/" >> /etc/apt/sources.list
> root@c9b183e45ffe:/# apt-key adv --keyserver keyserver.ubuntu.com --recv-key 
> 'E19F5F87128899B192B1A2C2AD5F960A256A04AF'
> Executing: /tmp/apt-key-gpghome.P6cxYkOge7/gpg.1.sh --keyserver 
> keyserver.ubuntu.com --recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF
> gpg: key AD5F960A256A04AF: public key "Johannes Ranke (Wissenschaftlicher 
> Berater) " imported
> gpg: Total number processed: 1
> gpg:   imported: 1
> root@c9b183e45ffe:/# apt-get update
> Get:1 http://deb.debian.org/debian buster InRelease [122 kB]
> Get:2 http://security.debian.org/debian-security buster/updates InRelease 
> [65.4 kB]
> Get:3 http://cloud.r-project.org/bin/linux/debian buster-cran35/ InRelease 
> [4375 B]
> Get:4 http://deb.debian.org/debian buster-updates InRelease [51.9 kB]
> Get:5 http://cloud.r-project.org/bin/linux/debian buster-cran35/ Packages 
> [53.3 kB]
> Get:6 http://security.debian.org/debian-security buster/updates/main arm64 
> Packages [287 kB]
> Get:7 http://deb.debian.org/debian buster/main arm64 Packages [7735 kB]
> Get:8 http://deb.debian.org/debian buster-updates/main arm64 Packages [14.5 
> kB]
> Fetched 8334 kB in 2s (4537 kB/s)
> Reading package lists... Done
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25469) Eval methods of Concat, Reverse and ElementAt should use pattern matching only once

2018-09-21 Thread Marek Novotny (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16623222#comment-16623222
 ] 

Marek Novotny commented on SPARK-25469:
---

[~hyukjin.kwon] Thanks for your help!

> Eval methods of Concat, Reverse and ElementAt should use pattern matching 
> only once
> ---
>
> Key: SPARK-25469
> URL: https://issues.apache.org/jira/browse/SPARK-25469
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Marek Novotny
>Priority: Major
>
> Pattern matching is utilized for each call of {{Concat.eval}}, 
> {{Reverse.nullSafeEval}} and {{ElementAt.nullSafeEval}} and thus for each 
> record of a dataset. This could lead to a performance penalty when expression 
> are executed in the interpreted mode. The goal of this ticket is to reduce 
> usage of pattern matching within the mentioned "eval" methods.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25469) Eval methods of Concat, Reverse and ElementAt should use pattern matching only once

2018-09-20 Thread Marek Novotny (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16621792#comment-16621792
 ] 

Marek Novotny commented on SPARK-25469:
---

[~hyukjin.kwon] Appologize for the duplicated ticket.

Btw, do you have any idea why the ticket didn't get linked with the PR 
[#22471|https://github.com/apache/spark/pull/22471]?

> Eval methods of Concat, Reverse and ElementAt should use pattern matching 
> only once
> ---
>
> Key: SPARK-25469
> URL: https://issues.apache.org/jira/browse/SPARK-25469
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Marek Novotny
>Priority: Major
>
> Pattern matching is utilized for each call of {{Concat.eval}}, 
> {{Reverse.nullSafeEval}} and {{ElementAt.nullSafeEval}} and thus for each 
> record of a dataset. This could lead to a performance penalty when expression 
> are executed in the interpreted mode. The goal of this ticket is to reduce 
> usage of pattern matching within the mentioned "eval" methods.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-25469) Eval methods of Concat, Reverse and ElementAt should use pattern matching only once

2018-09-20 Thread Marek Novotny (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-25469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marek Novotny updated SPARK-25469:
--
Description: Pattern matching is utilized for each call of {{Concat.eval}}, 
{{Reverse.nullSafeEval}} and {{ElementAt.nullSafeEval}} and thus for each 
record of a dataset. This could lead to a performance penalty when expression 
are executed in the interpreted mode. The goal of this ticket is to reduce 
usage of pattern matching within the mentioned "eval" methods.

> Eval methods of Concat, Reverse and ElementAt should use pattern matching 
> only once
> ---
>
> Key: SPARK-25469
> URL: https://issues.apache.org/jira/browse/SPARK-25469
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Marek Novotny
>Priority: Major
>
> Pattern matching is utilized for each call of {{Concat.eval}}, 
> {{Reverse.nullSafeEval}} and {{ElementAt.nullSafeEval}} and thus for each 
> record of a dataset. This could lead to a performance penalty when expression 
> are executed in the interpreted mode. The goal of this ticket is to reduce 
> usage of pattern matching within the mentioned "eval" methods.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-25469) Eval methods of Concat, Reverse and ElementAt should use pattern matching only once

2018-09-20 Thread Marek Novotny (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-25469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marek Novotny updated SPARK-25469:
--
Summary: Eval methods of Concat, Reverse and ElementAt should use pattern 
matching only once  (was: Eval method of Concat expression should call pattern 
matching only once)

> Eval methods of Concat, Reverse and ElementAt should use pattern matching 
> only once
> ---
>
> Key: SPARK-25469
> URL: https://issues.apache.org/jira/browse/SPARK-25469
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Marek Novotny
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-25469) Eval method of Concat expression should call pattern matching only once

2018-09-19 Thread Marek Novotny (JIRA)

Marek Novotny created SPARK-25469:
-

 Summary: Eval method of Concat expression should call pattern 
matching only once
 Key: SPARK-25469
 URL: https://issues.apache.org/jira/browse/SPARK-25469
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0, 3.0.0
Reporter: Marek Novotny






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-25470) Eval method of Concat expression should call pattern matching only once

2018-09-19 Thread Marek Novotny (JIRA)

Marek Novotny created SPARK-25470:
-

 Summary: Eval method of Concat expression should call pattern 
matching only once
 Key: SPARK-25470
 URL: https://issues.apache.org/jira/browse/SPARK-25470
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0, 3.0.0
Reporter: Marek Novotny






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-25122) Deduplication of supports equals code

2018-08-15 Thread Marek Novotny (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-25122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marek Novotny updated SPARK-25122:
--
Description: 
The method {{*supportEquals}} determining whether elements of a data type could 
be used as items in a hash set or as keys in a hash map is duplicated across 
multiple collection and higher-order functions.

The goal of this ticket is to deduplicate the method.

  was:
The method \{{*supportEquals} determining whether elements of a data type could 
be used as items in a hash set or as keys in a hash map is duplicated across 
multiple collection and higher-order functions.

The goal of this ticket is to deduplicate the method.


> Deduplication of supports equals code
> -
>
> Key: SPARK-25122
> URL: https://issues.apache.org/jira/browse/SPARK-25122
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Marek Novotny
>Priority: Trivial
>
> The method {{*supportEquals}} determining whether elements of a data type 
> could be used as items in a hash set or as keys in a hash map is duplicated 
> across multiple collection and higher-order functions.
> The goal of this ticket is to deduplicate the method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-25122) Deduplication of supports equals code

2018-08-15 Thread Marek Novotny (JIRA)

Marek Novotny created SPARK-25122:
-

 Summary: Deduplication of supports equals code
 Key: SPARK-25122
 URL: https://issues.apache.org/jira/browse/SPARK-25122
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: Marek Novotny


The method \{{*supportEquals} determining whether elements of a data type could 
be used as items in a hash set or as keys in a hash map is duplicated across 
multiple collection and higher-order functions.

The goal of this ticket is to deduplicate the method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25068) High-order function: exists(array, function) → boolean

2018-08-14 Thread Marek Novotny (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579386#comment-16579386
 ] 

Marek Novotny commented on SPARK-25068:
---

That's a good point. Thanks for your answer!

> High-order function: exists(array, function) → boolean
> -
>
> Key: SPARK-25068
> URL: https://issues.apache.org/jira/browse/SPARK-25068
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
> Fix For: 2.4.0
>
>
> Tests if arrays have those elements for which function returns true.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25068) High-order function: exists(array, function) → boolean

2018-08-13 Thread Marek Novotny (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578087#comment-16578087
 ] 

Marek Novotny commented on SPARK-25068:
---

I am curious about the motivation for adding this function. :) Was it added 
just to get Spark aligned with some other SQL engine or standard? Or is the 
motivation to offer users a function with better performance in {{true}} cases 
compare to {{aggregate}}? If so, what about adding a function like {{forAll}} 
in Scala that would stop iterating when the lambda returns {{false}}?

> High-order function: exists(array, function) → boolean
> -
>
> Key: SPARK-25068
> URL: https://issues.apache.org/jira/browse/SPARK-25068
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
> Fix For: 2.4.0
>
>
> Tests if arrays have those elements for which function returns true.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-24042) High-order function: zip_with_index

2018-08-09 Thread Marek Novotny (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-24042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marek Novotny resolved SPARK-24042.
---
Resolution: Won't Fix

> High-order function: zip_with_index
> ---
>
> Key: SPARK-24042
> URL: https://issues.apache.org/jira/browse/SPARK-24042
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Marek Novotny
>Priority: Major
>
> Implement function {{zip_with_index(array[, indexFirst])}} that transforms 
> the input array by encapsulating elements into pairs with indexes indicating 
> the order.
> Examples:
> {{zip_with_index(array("d", "a", null, "b")) => 
> [("d",0),("a",1),(null,2),("b",3)]}}
> {{zip_with_index(array("d", "a", null, "b"), true) => 
> [(0,"d"),(1,"a"),(2,null),(3,"b")]}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23938) High-order function: map_zip_with(map, map, function) → map

2018-08-06 Thread Marek Novotny (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-23938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16570148#comment-16570148
 ] 

Marek Novotny commented on SPARK-23938:
---

I'm working on this. Thanks.

> High-order function: map_zip_with(map, map, function V3>) → map
> ---
>
> Key: SPARK-23938
> URL: https://issues.apache.org/jira/browse/SPARK-23938
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Priority: Major
>
> Ref:  https://prestodb.io/docs/current/functions/map.html
> Merges the two given maps into a single map by applying function to the pair 
> of values with the same key. For keys only presented in one map, NULL will be 
> passed as the value for the missing key.
> {noformat}
> SELECT map_zip_with(MAP(ARRAY[1, 2, 3], ARRAY['a', 'b', 'c']), -- {1 -> ad, 2 
> -> be, 3 -> cf}
> MAP(ARRAY[1, 2, 3], ARRAY['d', 'e', 'f']),
> (k, v1, v2) -> concat(v1, v2));
> SELECT map_zip_with(MAP(ARRAY['k1', 'k2'], ARRAY[1, 2]), -- {k1 -> ROW(1, 
> null), k2 -> ROW(2, 4), k3 -> ROW(null, 9)}
> MAP(ARRAY['k2', 'k3'], ARRAY[4, 9]),
> (k, v1, v2) -> (v1, v2));
> SELECT map_zip_with(MAP(ARRAY['a', 'b', 'c'], ARRAY[1, 8, 27]), -- {a -> a1, 
> b -> b4, c -> c9}
> MAP(ARRAY['a', 'b', 'c'], ARRAY[1, 2, 3]),
> (k, v1, v2) -> k || CAST(v1/v2 AS VARCHAR));
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23901) Data Masking Functions

2018-07-16 Thread Marek Novotny (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-23901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16545443#comment-16545443
 ] 

Marek Novotny commented on SPARK-23901:
---

Is there a consensus on getting the masking functions to version 2.4.0? If so, 
what about extending also Python and R API to be consistent? (I volunteer for 
that.)

> Data Masking Functions
> --
>
> Key: SPARK-23901
> URL: https://issues.apache.org/jira/browse/SPARK-23901
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Assignee: Marco Gaido
>Priority: Major
> Fix For: 2.4.0
>
>
> - mask()
>  - mask_first_n()
>  - mask_last_n()
>  - mask_hash()
>  - mask_show_first_n()
>  - mask_show_last_n()
> Reference:
> [1] 
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DataMaskingFunctions]
> [2] https://issues.apache.org/jira/browse/HIVE-13568
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24741) Have a built-in AVRO data source implementation

2018-07-13 Thread Marek Novotny (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-24741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16543367#comment-16543367
 ] 

Marek Novotny commented on SPARK-24741:
---

Hi [~smilegator],
 The main motivation for creation of ABRiS was to avoid the same frequent logic 
in every project that (de)serializes AVRO data read from Kafka via Structured 
Streaming. We highly appreciate the intention to introduce the SQL functions 
from_avro and to_avro that will play that role.

[~felipesmmelo] and I have checked the Spark-Avro library and seen that it's 
heavily file-oriented and it will require some adaptations to be reusable by 
internals of the newly proposed functions. We are keen to participate and 
promote those adaptations, using our findings from ABRiS and feedbacks from 
users.

RE: Confluent schema registry - Managing schema dependencies among several 
producers without that could be a nightmare. But no problem, the connectivity 
to it could be solved by an external library. :)

> Have a built-in AVRO data source implementation
> ---
>
> Key: SPARK-24741
> URL: https://issues.apache.org/jira/browse/SPARK-24741
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL, Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Xiao Li
>Assignee: Gengliang Wang
>Priority: Major
>
> Apache Avro (https://avro.apache.org) is a data serialization format. It is 
> widely used in the Spark and Hadoop ecosystem, especially for Kafka-based 
> data pipelines. Using Spark-Avro package 
> (https://github.com/databricks/spark-avro), Spark SQL can read and write the 
> avro data. Making spark-Avro built-in can provide a better experience for 
> first-time users of Spark SQL and structured streaming. We expect the 
> built-in Avro data source can further improve the adoption of structured 
> streaming. We should consider inlining 
> https://github.com/databricks/spark-avro.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24741) Have a built-in AVRO data source implementation

2018-07-09 Thread Marek Novotny (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-24741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16536907#comment-16536907
 ] 

Marek Novotny commented on SPARK-24741:
---

It would be nice if the build-in data source was interoperable with Structured 
Streaming and schema registries. (Maybe like 
[ABRiS|https://github.com/AbsaOSS/ABRiS] does?)

cc [~felipesmmelo]

> Have a built-in AVRO data source implementation
> ---
>
> Key: SPARK-24741
> URL: https://issues.apache.org/jira/browse/SPARK-24741
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL, Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Xiao Li
>Priority: Major
>
> Apache Avro (https://avro.apache.org) is a data serialization format. It is 
> widely used in the Spark and Hadoop ecosystem, especially for Kafka-based 
> data pipelines. Using Spark-Avro package 
> (https://github.com/databricks/spark-avro), Spark SQL can read and write the 
> avro data. Making spark-Avro built-in can provide a better experience for 
> first-time users of Spark SQL and structured streaming. We expect the 
> built-in Avro data source can further improve the adoption of structured 
> streaming. We should consider inlining 
> https://github.com/databricks/spark-avro.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24165) UDF within when().otherwise() raises NullPointerException

2018-06-29 Thread Marek Novotny (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-24165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527734#comment-16527734
 ] 

Marek Novotny commented on SPARK-24165:
---

[~jingxuan001] Please could you provide a short example demonstrating the bug? 
I would like to double check that my patch fixes exactly your problam. Thanks.

> UDF within when().otherwise() raises NullPointerException
> -
>
> Key: SPARK-24165
> URL: https://issues.apache.org/jira/browse/SPARK-24165
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Jingxuan Wang
>Priority: Major
>
> I have a UDF which takes java.sql.Timestamp and String as input column type 
> and returns an Array of (Seq[case class], Double) as output. Since some of 
> values in input columns can be nullable, I put the UDF inside a 
> when($input.isNull, null).otherwise(UDF) filter. Such function works well 
> when I test in spark shell. But running as a scala jar in spark-submit with 
> yarn cluster mode, it raised NullPointerException which points to the UDF 
> function. If I remove the when().otherwsie() condition, but put null check 
> inside the UDF, the function works without issue in spark-submit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-24165) UDF within when().otherwise() raises NullPointerException

2018-06-29 Thread Marek Novotny (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-24165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527556#comment-16527556
 ] 

Marek Novotny edited comment on SPARK-24165 at 6/29/18 12:33 PM:
-

It seems that Spark is not able resolve nullability for nested types correctly.

{{val rows = new util.ArrayList[Row]()}}
 {{rows.add(Row(true, ("1", 1)))}}
 {{rows.add(Row(false, (null, 2)))}}
 {{val schema = StructType(Seq(}}
 {{StructField("cond", BooleanType, false),}}
 {{StructField("s", StructType(Seq(}}
 {{StructField("val1", StringType, true),}}
 {{StructField("val2", IntegerType, false)}}
 {{)))}}
 {{))}}

{{val df = spark.createDataFrame(rows, schema)}}

{{df.select(when('cond, expr("struct('x' as val1, 10 as val2)")).otherwise('s) 
as "result").printSchema()}}

Result:

{{root}}
 {{|-- result: struct (nullable = true)}}
 {{| |-- val1: string (nullable = *{color:#ff}false{color}*)}}
 {{| |-- val2: integer (nullable = false)}}

 

I will take a look at the problem.

 


was (Author: mn-mikke):
It seems that Spark is not able resolve nullability for nested types correctly.

{{val rows = new util.ArrayList[Row]()}}
{{rows.add(Row(true, ("1", 1)))}}
{{rows.add(Row(false, (null, 2)))}}
{{val schema = StructType(Seq(}}
{{ StructField("cond", BooleanType, false),}}
{{ StructField("s", StructType(Seq(}}
{{ StructField("val1", StringType, true),}}
{{ StructField("val2", IntegerType, false)}}
{{ )))}}
{{))}}

{{val df = spark.createDataFrame(rows, schema)}}

{{df.select(when('cond, expr("struct('x' as val1, 10 as val2)")).otherwise('s) 
as "result").printSchema()}}

Result:

{{root}}
{{ |-- result: struct (nullable = true)}}
{{ | |-- val1: string (nullable = *{color:#FF}false{color}*)}}
{{ | |-- val2: integer (nullable = false)}}

 

I will take a look at the problem.

 

> UDF within when().otherwise() raises NullPointerException
> -
>
> Key: SPARK-24165
> URL: https://issues.apache.org/jira/browse/SPARK-24165
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Jingxuan Wang
>Priority: Major
>
> I have a UDF which takes java.sql.Timestamp and String as input column type 
> and returns an Array of (Seq[case class], Double) as output. Since some of 
> values in input columns can be nullable, I put the UDF inside a 
> when($input.isNull, null).otherwise(UDF) filter. Such function works well 
> when I test in spark shell. But running as a scala jar in spark-submit with 
> yarn cluster mode, it raised NullPointerException which points to the UDF 
> function. If I remove the when().otherwsie() condition, but put null check 
> inside the UDF, the function works without issue in spark-submit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24165) UDF within when().otherwise() raises NullPointerException

2018-06-29 Thread Marek Novotny (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-24165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527556#comment-16527556
 ] 

Marek Novotny commented on SPARK-24165:
---

It seems that Spark is not able resolve nullability for nested types correctly.

{{val rows = new util.ArrayList[Row]()}}
{{rows.add(Row(true, ("1", 1)))}}
{{rows.add(Row(false, (null, 2)))}}
{{val schema = StructType(Seq(}}
{{ StructField("cond", BooleanType, false),}}
{{ StructField("s", StructType(Seq(}}
{{ StructField("val1", StringType, true),}}
{{ StructField("val2", IntegerType, false)}}
{{ )))}}
{{))}}

{{val df = spark.createDataFrame(rows, schema)}}

{{df.select(when('cond, expr("struct('x' as val1, 10 as val2)")).otherwise('s) 
as "result").printSchema()}}

Result:

{{root}}
{{ |-- result: struct (nullable = true)}}
{{ | |-- val1: string (nullable = *{color:#FF}false{color}*)}}
{{ | |-- val2: integer (nullable = false)}}

 

I will take a look at the problem.

 

> UDF within when().otherwise() raises NullPointerException
> -
>
> Key: SPARK-24165
> URL: https://issues.apache.org/jira/browse/SPARK-24165
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Jingxuan Wang
>Priority: Major
>
> I have a UDF which takes java.sql.Timestamp and String as input column type 
> and returns an Array of (Seq[case class], Double) as output. Since some of 
> values in input columns can be nullable, I put the UDF inside a 
> when($input.isNull, null).otherwise(UDF) filter. Such function works well 
> when I test in spark shell. But running as a scala jar in spark-submit with 
> yarn cluster mode, it raised NullPointerException which points to the UDF 
> function. If I remove the when().otherwsie() condition, but put null check 
> inside the UDF, the function works without issue in spark-submit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-24636) Type Coercion of Arrays for array_join Function

2018-06-23 Thread Marek Novotny (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-24636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marek Novotny updated SPARK-24636:
--
Description: 
The description of SPARK-23916 refers to the implementation of {{array_join}} 
function in Presto. This implementation accepts arbitrary arrays of primitive 
types as an input.:

{{presto> SELECT array_join(ARRAY [1, 2, 3], ', ');}}
 {{_col0}}
 {{-}}
 {{1, 2, 3}}
 {{(1 row)}}

The goal of this Jira ticket is to change the current implementation of 
{{array_join}} function to support other types of arrays.

  was:
The description of SPARK-23916 refers to the implementation of {{array_join}} 
function in Presto. This implementation accepts an arbitrary arrays of 
primitive types.:

{{presto> SELECT array_join(ARRAY [1, 2, 3], ', ');}}
 {{_col0}}
 {{-}}
 {{1, 2, 3}}
 {{(1 row)}}

The goal of this Jira ticket is to change the current implementation of 
array_join function to support other types of arrays.


> Type Coercion of Arrays for array_join Function
> ---
>
> Key: SPARK-24636
> URL: https://issues.apache.org/jira/browse/SPARK-24636
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Marek Novotny
>Priority: Major
>
> The description of SPARK-23916 refers to the implementation of {{array_join}} 
> function in Presto. This implementation accepts arbitrary arrays of primitive 
> types as an input.:
> {{presto> SELECT array_join(ARRAY [1, 2, 3], ', ');}}
>  {{_col0}}
>  {{-}}
>  {{1, 2, 3}}
>  {{(1 row)}}
> The goal of this Jira ticket is to change the current implementation of 
> {{array_join}} function to support other types of arrays.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-24636) Type Coercion of Arrays for array_join Function

2018-06-23 Thread Marek Novotny (JIRA)

Marek Novotny created SPARK-24636:
-

 Summary: Type Coercion of Arrays for array_join Function
 Key: SPARK-24636
 URL: https://issues.apache.org/jira/browse/SPARK-24636
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: Marek Novotny


The description of SPARK-23916 refers to the implementation of {{array_join}} 
function in Presto. This implementation accepts an arbitrary arrays of 
primitive types.:

{{presto> SELECT array_join(ARRAY [1, 2, 3], ', ');}}
 {{_col0}}
 {{-}}
 {{1, 2, 3}}
 {{(1 row)}}

The goal of this Jira ticket is to change the current implementation of 
array_join function to support other types of arrays.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-24331) Add arrays_overlap / array_repeat / map_entries

2018-05-25 Thread Marek Novotny (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marek Novotny updated SPARK-24331:
--
Description: 
Add SparkR equivalent to:
 * arrays_overlap - SPARK-23922
 * array_repeat - SPARK-23925
 * map_entries - SPARK-23935

  was:
Add SparkR equivalent to:
 * cardinality - SPARK-23923
 * arrays_overlap - SPARK-23922
 * array_repeat - SPARK-23925
 * map_entries - SPARK-23935


> Add arrays_overlap / array_repeat / map_entries  
> -
>
> Key: SPARK-24331
> URL: https://issues.apache.org/jira/browse/SPARK-24331
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Marek Novotny
>Priority: Major
>
> Add SparkR equivalent to:
>  * arrays_overlap - SPARK-23922
>  * array_repeat - SPARK-23925
>  * map_entries - SPARK-23935



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-24331) Add arrays_overlap / array_repeat / map_entries

2018-05-25 Thread Marek Novotny (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marek Novotny updated SPARK-24331:
--
Summary: Add arrays_overlap / array_repeat / map_entries(was: Add 
cardinality / arrays_overlap / array_repeat / map_entries  )

> Add arrays_overlap / array_repeat / map_entries  
> -
>
> Key: SPARK-24331
> URL: https://issues.apache.org/jira/browse/SPARK-24331
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Marek Novotny
>Priority: Major
>
> Add SparkR equivalent to:
>  * cardinality - SPARK-23923
>  * arrays_overlap - SPARK-23922
>  * array_repeat - SPARK-23925
>  * map_entries - SPARK-23935



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24331) Add cardinality / arrays_overlap / array_repeat / map_entries

2018-05-21 Thread Marek Novotny (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16482577#comment-16482577
 ] 

Marek Novotny commented on SPARK-24331:
---

I will work on this. Thanks!

> Add cardinality / arrays_overlap / array_repeat / map_entries  
> ---
>
> Key: SPARK-24331
> URL: https://issues.apache.org/jira/browse/SPARK-24331
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Marek Novotny
>Priority: Major
>
> Add SparkR equivalent to:
>  * cardinality - SPARK-23923
>  * arrays_overlap - SPARK-23922
>  * array_repeat - SPARK-23925
>  * map_entries - SPARK-23935



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-24331) Add cardinality / arrays_overlap / array_repeat / map_entries

2018-05-21 Thread Marek Novotny (JIRA)

Marek Novotny created SPARK-24331:
-

 Summary: Add cardinality / arrays_overlap / array_repeat / 
map_entries  
 Key: SPARK-24331
 URL: https://issues.apache.org/jira/browse/SPARK-24331
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Affects Versions: 2.4.0
Reporter: Marek Novotny


Add SparkR equivalent to:
 * cardinality - SPARK-23923
 * arrays_overlap - SPARK-23922
 * array_repeat - SPARK-23925
 * map_entries - SPARK-23935



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-24305) Avoid serialization of private fields in new collection expressions

2018-05-17 Thread Marek Novotny (JIRA)

Marek Novotny created SPARK-24305:
-

 Summary: Avoid serialization of private fields in new collection 
expressions
 Key: SPARK-24305
 URL: https://issues.apache.org/jira/browse/SPARK-24305
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: Marek Novotny


Make sure that private fields of expression case classes in 
_collectionOperations.scala_ are not serialized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23928) High-order function: shuffle(x) → array

2018-05-15 Thread Marek Novotny (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16476108#comment-16476108
 ] 

Marek Novotny commented on SPARK-23928:
---

[~hzlu] Cool. (Re: Testing) Sounds good to me. It would be difficult to try to 
test the randomness. Instead of that, I would refer to some well-known 
algorithm (e.g. Fisher–Yates).

> High-order function: shuffle(x) → array
> ---
>
> Key: SPARK-23928
> URL: https://issues.apache.org/jira/browse/SPARK-23928
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Priority: Major
>
> Ref: https://prestodb.io/docs/current/functions/array.html
> Generate a random permutation of the given array x.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23928) High-order function: shuffle(x) → array

2018-05-15 Thread Marek Novotny (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16475451#comment-16475451
 ] 

Marek Novotny commented on SPARK-23928:
---

[~hzlu] Any joy? I can take this one or help.

> High-order function: shuffle(x) → array
> ---
>
> Key: SPARK-23928
> URL: https://issues.apache.org/jira/browse/SPARK-23928
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Priority: Major
>
> Ref: https://prestodb.io/docs/current/functions/array.html
> Generate a random permutation of the given array x.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23931) High-order function: zip(array1, array2[, ...]) → array

2018-05-11 Thread Marek Novotny (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16472169#comment-16472169
 ] 

Marek Novotny commented on SPARK-23931:
---

Ok. Good luck!

> High-order function: zip(array1, array2[, ...]) → array
> 
>
> Key: SPARK-23931
> URL: https://issues.apache.org/jira/browse/SPARK-23931
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Priority: Major
>
> Ref: https://prestodb.io/docs/current/functions/array.html
> Merges the given arrays, element-wise, into a single array of rows. The M-th 
> element of the N-th argument will be the N-th field of the M-th output 
> element. If the arguments have an uneven length, missing values are filled 
> with NULL.
> {noformat}
> SELECT zip(ARRAY[1, 2], ARRAY['1b', null, '3b']); -- [ROW(1, '1b'), ROW(2, 
> null), ROW(null, '3b')]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23931) High-order function: zip(array1, array2[, ...]) → array

2018-05-11 Thread Marek Novotny (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16472157#comment-16472157
 ] 

Marek Novotny commented on SPARK-23931:
---

[~DylanGuedes] Any joy? I can take this one if you want.

> High-order function: zip(array1, array2[, ...]) → array
> 
>
> Key: SPARK-23931
> URL: https://issues.apache.org/jira/browse/SPARK-23931
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Priority: Major
>
> Ref: https://prestodb.io/docs/current/functions/array.html
> Merges the given arrays, element-wise, into a single array of rows. The M-th 
> element of the N-th argument will be the N-th field of the M-th output 
> element. If the arguments have an uneven length, missing values are filled 
> with NULL.
> {noformat}
> SELECT zip(ARRAY[1, 2], ARRAY['1b', null, '3b']); -- [ROW(1, '1b'), ROW(2, 
> null), ROW(null, '3b')]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-24197) add array_sort function

2018-05-07 Thread Marek Novotny (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marek Novotny updated SPARK-24197:
--
Description: Add a SparkR equivalent function to 
[SPARK-23921|https://issues.apache.org/jira/browse/SPARK-23921].  (was: Add a 
SparkR equivalent function to SPARK-23921.)

> add array_sort function
> ---
>
> Key: SPARK-24197
> URL: https://issues.apache.org/jira/browse/SPARK-24197
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Marek Novotny
>Priority: Major
>
> Add a SparkR equivalent function to 
> [SPARK-23921|https://issues.apache.org/jira/browse/SPARK-23921].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-24197) add array_sort function

2018-05-07 Thread Marek Novotny (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marek Novotny updated SPARK-24197:
--
Description: Add a SparkR equivalent function to SPARK-23921.  (was: Add a 
SparkR equivalent function for 
[SPARK-23921|https://issues.apache.org/jira/browse/SPARK-23921].)

> add array_sort function
> ---
>
> Key: SPARK-24197
> URL: https://issues.apache.org/jira/browse/SPARK-24197
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Marek Novotny
>Priority: Major
>
> Add a SparkR equivalent function to SPARK-23921.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24198) add slice function

2018-05-07 Thread Marek Novotny (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465627#comment-16465627
 ] 

Marek Novotny commented on SPARK-24198:
---

I will work on this. Thanks.

> add slice function
> --
>
> Key: SPARK-24198
> URL: https://issues.apache.org/jira/browse/SPARK-24198
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Marek Novotny
>Priority: Major
>
> Add a SparkR equivalent function to 
> [SPARK-23930|https://issues.apache.org/jira/browse/SPARK-23930].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-24198) add slice function

2018-05-07 Thread Marek Novotny (JIRA)

Marek Novotny created SPARK-24198:
-

 Summary: add slice function
 Key: SPARK-24198
 URL: https://issues.apache.org/jira/browse/SPARK-24198
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Affects Versions: 2.4.0
Reporter: Marek Novotny


Add a SparkR equivalent function to 
[SPARK-23930|https://issues.apache.org/jira/browse/SPARK-23930].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24197) add array_sort function

2018-05-07 Thread Marek Novotny (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465626#comment-16465626
 ] 

Marek Novotny commented on SPARK-24197:
---

I will work on this. Thanks.

> add array_sort function
> ---
>
> Key: SPARK-24197
> URL: https://issues.apache.org/jira/browse/SPARK-24197
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Marek Novotny
>Priority: Major
>
> Add a SparkR equivalent function for 
> [SPARK-23921|https://issues.apache.org/jira/browse/SPARK-23921].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-24197) add array_sort function

2018-05-07 Thread Marek Novotny (JIRA)

Marek Novotny created SPARK-24197:
-

 Summary: add array_sort function
 Key: SPARK-24197
 URL: https://issues.apache.org/jira/browse/SPARK-24197
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Affects Versions: 2.4.0
Reporter: Marek Novotny


Add a SparkR equivalent function for 
[SPARK-23921|https://issues.apache.org/jira/browse/SPARK-23921].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-24148) Adding Ability to Specify SQL Type of Empty Arrays

2018-05-02 Thread Marek Novotny (JIRA)

Marek Novotny created SPARK-24148:
-

 Summary: Adding Ability to Specify SQL Type of Empty Arrays
 Key: SPARK-24148
 URL: https://issues.apache.org/jira/browse/SPARK-24148
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: Marek Novotny


The function array(...) doesn't let user specify the element type and uses 
string as default.

This is a shortcoming in the API as the developer either needs a scala case 
class and let catalyst infer the schema or by using the Java UDF API.

This Jira suggests ehnancement of the array function, which will produce an 
empty array of a given element type.

A perfect example of the use case is {{when(cond, 
trueExp).otherwise(falseExp)}}, which expects {{trueExp}} and {{falseExp}} of 
being the same type. In scenario where we want to produce an empty array, in 
one of these cases, there's no other way than creating an UDF.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23935) High-order function: map_entries(map<K, V>) → array<row<K,V>>

2018-04-27 Thread Marek Novotny (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456274#comment-16456274
 ] 

Marek Novotny commented on SPARK-23935:
---

I will work on this one. Thanks.

> High-order function: map_entries(map) → array>
> -
>
> Key: SPARK-23935
> URL: https://issues.apache.org/jira/browse/SPARK-23935
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Priority: Major
>
> Ref: https://prestodb.io/docs/current/functions/map.html
> Returns an array of all entries in the given map.
> {noformat}
> SELECT map_entries(MAP(ARRAY[1, 2], ARRAY['x', 'y'])); -- [ROW(1, 'x'), 
> ROW(2, 'y')]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23934) High-order function: map_from_entries(array<row<K, V>>) → map<K,V>

2018-04-27 Thread Marek Novotny (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456272#comment-16456272
 ] 

Marek Novotny commented on SPARK-23934:
---

I will work on this one. Thanks.

> High-order function: map_from_entries(array>) → map
> --
>
> Key: SPARK-23934
> URL: https://issues.apache.org/jira/browse/SPARK-23934
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Priority: Major
>
> Ref: https://prestodb.io/docs/current/functions/map.html
> Returns a map created from the given array of entries.
> {noformat}
> SELECT map_from_entries(ARRAY[(1, 'x'), (2, 'y')]); -- {1 -> 'x', 2 -> 'y'}
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-23925) High-order function: repeat(element, count) → array

2018-04-26 Thread Marek Novotny (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16454526#comment-16454526
 ] 

Marek Novotny edited comment on SPARK-23925 at 4/26/18 4:47 PM:


[~pepinoflo] Any joy? I can take this one or help if you want?


was (Author: mn-mikke):
@pepinoflo Any joy? I can take this one or help if you want?

> High-order function: repeat(element, count) → array
> ---
>
> Key: SPARK-23925
> URL: https://issues.apache.org/jira/browse/SPARK-23925
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Priority: Major
>
> Ref: https://prestodb.io/docs/current/functions/array.html
> Repeat element for count times.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23925) High-order function: repeat(element, count) → array

2018-04-26 Thread Marek Novotny (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16454526#comment-16454526
 ] 

Marek Novotny commented on SPARK-23925:
---

@pepinoflo Any joy? I can take this one or help if you want?

> High-order function: repeat(element, count) → array
> ---
>
> Key: SPARK-23925
> URL: https://issues.apache.org/jira/browse/SPARK-23925
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Priority: Major
>
> Ref: https://prestodb.io/docs/current/functions/array.html
> Repeat element for count times.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-24042) High-order function: zip_with_index

2018-04-21 Thread Marek Novotny (JIRA)

Marek Novotny created SPARK-24042:
-

 Summary: High-order function: zip_with_index
 Key: SPARK-24042
 URL: https://issues.apache.org/jira/browse/SPARK-24042
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.4.0
Reporter: Marek Novotny


Implement function {{zip_with_index(array[, indexFirst])}} that transforms the 
input array by encapsulating elements into pairs with indexes indicating the 
order.

Examples:
{{zip_with_index(array("d", "a", null, "b")) => 
[("d",0),("a",1),(null,2),("b",3)]}}
{{zip_with_index(array("d", "a", null, "b"), true) => 
[(0,"d"),(1,"a"),(2,null),(3,"b")]}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23936) High-order function: map_concat(map1<K, V>, map2<K, V>, ..., mapN<K, V>) → map<K,V>

2018-04-11 Thread Marek Novotny (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434185#comment-16434185
 ] 

Marek Novotny commented on SPARK-23936:
---

Shouldn't we overload _concat_ function for maps instead of introducing 
_map_concat_? 

> High-order function: map_concat(map1, map2, ..., mapN) → 
> map
> ---
>
> Key: SPARK-23936
> URL: https://issues.apache.org/jira/browse/SPARK-23936
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Priority: Major
>
> Ref:  https://prestodb.io/docs/current/functions/map.html
> Returns the union of all the given maps. If a key is found in multiple given 
> maps, that key’s value in the resulting map comes from the last one of those 
> maps.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23926) High-order function: reverse(x) → array

2018-04-09 Thread Marek Novotny (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430539#comment-16430539
 ] 

Marek Novotny commented on SPARK-23926:
---

I take this one.

> High-order function: reverse(x) → array
> ---
>
> Key: SPARK-23926
> URL: https://issues.apache.org/jira/browse/SPARK-23926
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Priority: Major
>
> Ref: https://prestodb.io/docs/current/functions/array.html
> Returns an array which has the reversed order of array x.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-23821) Collection function: flatten

2018-03-29 Thread Marek Novotny (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marek Novotny updated SPARK-23821:
--
Summary: Collection function: flatten  (was: Collection functions: flatten)

> Collection function: flatten
> 
>
> Key: SPARK-23821
> URL: https://issues.apache.org/jira/browse/SPARK-23821
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Marek Novotny
>Priority: Major
>
> Add the flatten function that transforms an Array of Arrays column into an 
> Array elements column. if the array structure contains more than two levels 
> of nesting, the function removes one nesting level
> Example:
> {{flatten(array(array(1, 2, 3), array(3, 4, 5), array(6, 7, 8)) => 
> [1,2,3,4,5,6,7,8,9]}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-23821) Collection functions: flatten

2018-03-29 Thread Marek Novotny (JIRA)

Marek Novotny created SPARK-23821:
-

 Summary: Collection functions: flatten
 Key: SPARK-23821
 URL: https://issues.apache.org/jira/browse/SPARK-23821
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 2.4.0
Reporter: Marek Novotny


Add the flatten function that transforms an Array of Arrays column into an 
Array elements column. if the array structure contains more than two levels of 
nesting, the function removes one nesting level

Example:
{{flatten(array(array(1, 2, 3), array(3, 4, 5), array(6, 7, 8)) => 
[1,2,3,4,5,6,7,8,9]}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-23798) The CreateArray and ConcatArray should return the default array type when no children provided

2018-03-27 Thread Marek Novotny (JIRA)

Marek Novotny created SPARK-23798:
-

 Summary: The CreateArray and ConcatArray should return the default 
array type when no children provided
 Key: SPARK-23798
 URL: https://issues.apache.org/jira/browse/SPARK-23798
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: Marek Novotny


The expressions should return ArrayType.defaultConcreteType.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23504) Flaky test: RateSourceV2Suite.basic microbatch execution

2018-03-27 Thread Marek Novotny (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16415313#comment-16415313
 ] 

Marek Novotny commented on SPARK-23504:
---

Experienced the [same 
problem|https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88605/testReport/org.apache.spark.sql.execution.streaming/RateSourceV2Suite/basic_microbatch_execution/].

> Flaky test: RateSourceV2Suite.basic microbatch execution
> 
>
> Key: SPARK-23504
> URL: https://issues.apache.org/jira/browse/SPARK-23504
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 2.4.0
>Reporter: Marcelo Vanzin
>Priority: Major
>
> Seen on an unrelated change:
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87635/testReport/org.apache.spark.sql.execution.streaming/RateSourceV2Suite/basic_microbatch_execution/
> {noformat}
> Error Message
> org.scalatest.exceptions.TestFailedException:   == Results == !== Correct 
> Answer - 10 == == Spark Answer - 0 == !struct<_1:timestamp,_2:int>   
> struct<> ![1969-12-31 16:00:00.0,0]  ![1969-12-31 16:00:00.1,1]  
> ![1969-12-31 16:00:00.2,2]  ![1969-12-31 16:00:00.3,3]  ![1969-12-31 
> 16:00:00.4,4]  ![1969-12-31 16:00:00.5,5]  ![1969-12-31 16:00:00.6,6] 
>  ![1969-12-31 16:00:00.7,7]  ![1969-12-31 16:00:00.8,8]  
> ![1969-12-31 16:00:00.9,9]== Progress ==
> AdvanceRateManualClock(1) => CheckLastBatch: [1969-12-31 
> 16:00:00.0,0],[1969-12-31 16:00:00.1,1],[1969-12-31 16:00:00.2,2],[1969-12-31 
> 16:00:00.3,3],[1969-12-31 16:00:00.4,4],[1969-12-31 16:00:00.5,5],[1969-12-31 
> 16:00:00.6,6],[1969-12-31 16:00:00.7,7],[1969-12-31 16:00:00.8,8],[1969-12-31 
> 16:00:00.9,9]StopStream
> StartStream(ProcessingTime(0),org.apache.spark.util.SystemClock@22bc97a,Map(),null)
> AdvanceRateManualClock(2)CheckLastBatch: [1969-12-31 
> 16:00:01.0,10],[1969-12-31 16:00:01.1,11],[1969-12-31 
> 16:00:01.2,12],[1969-12-31 16:00:01.3,13],[1969-12-31 
> 16:00:01.4,14],[1969-12-31 16:00:01.5,15],[1969-12-31 
> 16:00:01.6,16],[1969-12-31 16:00:01.7,17],[1969-12-31 
> 16:00:01.8,18],[1969-12-31 16:00:01.9,19]  == Stream == Output Mode: Append 
> Stream state: 
> {org.apache.spark.sql.execution.streaming.sources.RateStreamMicroBatchReader@75b88292:
>  {"0":{"value":-1,"runTimeMs":0}}} Thread state: alive Thread stack trace: 
> sun.misc.Unsafe.park(Native Method) 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>  
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>  
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>  scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202) 
> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218) 
> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:153) 
> org.apache.spark.util.ThreadUtils$.awaitReady(ThreadUtils.scala:222) 
> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:633) 
> org.apache.spark.SparkContext.runJob(SparkContext.scala:2030) 
> org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2.scala:84)
>  
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
>  
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
>  
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
>  
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) 
> org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:247) 
> org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:294) 
> org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:3272)
>  org.apache.spark.sql.Dataset$$anonfun$collect$1.apply(Dataset.scala:2722) 
> org.apache.spark.sql.Dataset$$anonfun$collect$1.apply(Dataset.scala:2722) 
> org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3253) 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
>  org.apache.spark.sql.Dataset.withAction(Dataset.scala:3252) 
> org.apache.spark.sql.Dataset.collect(Dataset.scala:2722) 
>

[jira] [Updated] (SPARK-23736) Extension of the concat function to support array columns

2018-03-26 Thread Marek Novotny (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marek Novotny updated SPARK-23736:
--
Description: 
Extend the _concat_ function to also support array columns.

Example:

{{concat(array(1, 2, 3), array(10, 20, 30), array(100, 200)) => [1, 2, 3,10, 
20, 30,100, 200] }}

  was:
Extend the _concat_ function to also support array columns.

Example:

concat(array(1, 2, 3), array(10, 20, 30), array(100, 200)) => [1, 2, 3,10, 20, 
30,100, 200] 


> Extension of the concat function to support array columns
> -
>
> Key: SPARK-23736
> URL: https://issues.apache.org/jira/browse/SPARK-23736
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Marek Novotny
>Priority: Major
>
> Extend the _concat_ function to also support array columns.
> Example:
> {{concat(array(1, 2, 3), array(10, 20, 30), array(100, 200)) => [1, 2, 3,10, 
> 20, 30,100, 200] }}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-23736) Extension of the concat function to support array columns

2018-03-26 Thread Marek Novotny (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marek Novotny updated SPARK-23736:
--
Description: 
Extend the _concat_ function to also support array columns.

Example:

concat(array(1, 2, 3), array(10, 20, 30), array(100, 200)) => [1, 2, 3,10, 20, 
30,100, 200] 

  was:
Implement the _concat_arrays_ function that merges two or more array columns 
into one. If any of children values is null, the function should return null.

{{def concat_arrays(columns : Column*): Column }}

Example:

[1, 2, 3], [10, 20, 30], [100, 200] => [1, 2, 3,10, 20, 30,100, 200] 

Summary: Extension of the concat function to support array columns  
(was: collection function: concat_arrays)

> Extension of the concat function to support array columns
> -
>
> Key: SPARK-23736
> URL: https://issues.apache.org/jira/browse/SPARK-23736
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Marek Novotny
>Priority: Major
>
> Extend the _concat_ function to also support array columns.
> Example:
> concat(array(1, 2, 3), array(10, 20, 30), array(100, 200)) => [1, 2, 3,10, 
> 20, 30,100, 200] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-23736) collection function: concat_arrays

2018-03-19 Thread Marek Novotny (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marek Novotny updated SPARK-23736:
--
Issue Type: New Feature  (was: Story)

> collection function: concat_arrays
> --
>
> Key: SPARK-23736
> URL: https://issues.apache.org/jira/browse/SPARK-23736
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Marek Novotny
>Priority: Major
>
> Implement the _concat_arrays_ function that merges two or more array columns 
> into one. If any of children values is null, the function should return null.
> {{def concat_arrays(columns : Column*): Column }}
> Example:
> [1, 2, 3], [10, 20, 30], [100, 200] => [1, 2, 3,10, 20, 30,100, 200] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-23736) collection function: concat_arrays

2018-03-19 Thread Marek Novotny (JIRA)

Marek Novotny created SPARK-23736:
-

 Summary: collection function: concat_arrays
 Key: SPARK-23736
 URL: https://issues.apache.org/jira/browse/SPARK-23736
 Project: Spark
  Issue Type: Story
  Components: SQL
Affects Versions: 2.4.0
Reporter: Marek Novotny


Implement the _concat_arrays_ function that merges two or more array columns 
into one. If any of children values is null, the function should return null.

{{def concat_arrays(columns : Column*): Column }}

Example:

[1, 2, 3], [10, 20, 30], [100, 200] => [1, 2, 3,10, 20, 30,100, 200] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

52 matches

Mail list logo