[jira] [Commented] (SPARK-35885) Use keyserver.ubuntu.com as a keyserver for CRAN
[ https://issues.apache.org/jira/browse/SPARK-35885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447386#comment-17447386 ] Marek Novotny commented on SPARK-35885: --- [~dongjoon] FYI, buster-cran35 is signed by a different key (fingerprint '95C0FAF38DB3CCAD0C080A7BDC78B2DDEABC47B7') since 17th November (see [http://cloud.r-project.org/bin/linux/debian/]). According to the current repository state, this change might affect the branch [branch-3.0|https://github.com/apache/spark/blob/branch-3.0/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/R/Dockerfile#L32]. > Use keyserver.ubuntu.com as a keyserver for CRAN > > > Key: SPARK-35885 > URL: https://issues.apache.org/jira/browse/SPARK-35885 > Project: Spark > Issue Type: Bug > Components: Kubernetes, R >Affects Versions: 3.0.3, 3.1.2, 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Blocker > Fix For: 3.2.0, 3.1.3, 3.0.4 > > > This issue aims to use `keyserver.ubuntu.com` as a keyserver for CRAN. > K8s SparkR docker image build fails because both servers don't work correctly. > {code} > $ docker run -it --rm openjdk:11 /bin/bash > root@3e89a8d05378:/# echo "deb http://cloud.r-project.org/bin/linux/debian > buster-cran35/" >> /etc/apt/sources.list > root@3e89a8d05378:/# (apt-key adv --keyserver keys.gnupg.net --recv-key > 'E19F5F87128899B192B1A2C2AD5F960A256A04AF' || apt-key adv --keyserver > keys.openpgp.org --recv-key 'E19F5F87128899B192B1A2C2AD5F960A256A04AF') > Executing: /tmp/apt-key-gpghome.8lNIiUuhoE/gpg.1.sh --keyserver > keys.gnupg.net --recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF > gpg: keyserver receive failed: No name > Executing: /tmp/apt-key-gpghome.stxb8XUlx8/gpg.1.sh --keyserver > keys.openpgp.org --recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF > gpg: key AD5F960A256A04AF: new key but contains no user ID - skipped > gpg: Total number processed: 1 > gpg: w/o user IDs: 1 > root@3e89a8d05378:/# apt-get update > ... > Err:3 http://cloud.r-project.org/bin/linux/debian buster-cran35/ InRelease > The following signatures couldn't be verified because the public key is not > available: NO_PUBKEY FCAE2A0E115C3D8A > ... > W: GPG error: http://cloud.r-project.org/bin/linux/debian buster-cran35/ > InRelease: The following signatures couldn't be verified because the public > key is not available: NO_PUBKEY FCAE2A0E115C3D8A > E: The repository 'http://cloud.r-project.org/bin/linux/debian buster-cran35/ > InRelease' is not signed. > N: Updating from such a repository can't be done securely, and is therefore > disabled by default. > N: See apt-secure(8) manpage for repository creation and user configuration > details. > {code} > `keyserver.ubuntu.com` is a recommended backup server in CRAN document. > - http://cloud.r-project.org/bin/linux/debian/ > {code} > $ docker run -it --rm openjdk:11 /bin/bash > root@c9b183e45ffe:/# echo "deb http://cloud.r-project.org/bin/linux/debian > buster-cran35/" >> /etc/apt/sources.list > root@c9b183e45ffe:/# apt-key adv --keyserver keyserver.ubuntu.com --recv-key > 'E19F5F87128899B192B1A2C2AD5F960A256A04AF' > Executing: /tmp/apt-key-gpghome.P6cxYkOge7/gpg.1.sh --keyserver > keyserver.ubuntu.com --recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF > gpg: key AD5F960A256A04AF: public key "Johannes Ranke (Wissenschaftlicher > Berater) " imported > gpg: Total number processed: 1 > gpg: imported: 1 > root@c9b183e45ffe:/# apt-get update > Get:1 http://deb.debian.org/debian buster InRelease [122 kB] > Get:2 http://security.debian.org/debian-security buster/updates InRelease > [65.4 kB] > Get:3 http://cloud.r-project.org/bin/linux/debian buster-cran35/ InRelease > [4375 B] > Get:4 http://deb.debian.org/debian buster-updates InRelease [51.9 kB] > Get:5 http://cloud.r-project.org/bin/linux/debian buster-cran35/ Packages > [53.3 kB] > Get:6 http://security.debian.org/debian-security buster/updates/main arm64 > Packages [287 kB] > Get:7 http://deb.debian.org/debian buster/main arm64 Packages [7735 kB] > Get:8 http://deb.debian.org/debian buster-updates/main arm64 Packages [14.5 > kB] > Fetched 8334 kB in 2s (4537 kB/s) > Reading package lists... Done > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25469) Eval methods of Concat, Reverse and ElementAt should use pattern matching only once
[ https://issues.apache.org/jira/browse/SPARK-25469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16623222#comment-16623222 ] Marek Novotny commented on SPARK-25469: --- [~hyukjin.kwon] Thanks for your help! > Eval methods of Concat, Reverse and ElementAt should use pattern matching > only once > --- > > Key: SPARK-25469 > URL: https://issues.apache.org/jira/browse/SPARK-25469 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0, 3.0.0 >Reporter: Marek Novotny >Priority: Major > > Pattern matching is utilized for each call of {{Concat.eval}}, > {{Reverse.nullSafeEval}} and {{ElementAt.nullSafeEval}} and thus for each > record of a dataset. This could lead to a performance penalty when expression > are executed in the interpreted mode. The goal of this ticket is to reduce > usage of pattern matching within the mentioned "eval" methods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25469) Eval methods of Concat, Reverse and ElementAt should use pattern matching only once
[ https://issues.apache.org/jira/browse/SPARK-25469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16621792#comment-16621792 ] Marek Novotny commented on SPARK-25469: --- [~hyukjin.kwon] Appologize for the duplicated ticket. Btw, do you have any idea why the ticket didn't get linked with the PR [#22471|https://github.com/apache/spark/pull/22471]? > Eval methods of Concat, Reverse and ElementAt should use pattern matching > only once > --- > > Key: SPARK-25469 > URL: https://issues.apache.org/jira/browse/SPARK-25469 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0, 3.0.0 >Reporter: Marek Novotny >Priority: Major > > Pattern matching is utilized for each call of {{Concat.eval}}, > {{Reverse.nullSafeEval}} and {{ElementAt.nullSafeEval}} and thus for each > record of a dataset. This could lead to a performance penalty when expression > are executed in the interpreted mode. The goal of this ticket is to reduce > usage of pattern matching within the mentioned "eval" methods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25469) Eval methods of Concat, Reverse and ElementAt should use pattern matching only once
[ https://issues.apache.org/jira/browse/SPARK-25469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marek Novotny updated SPARK-25469: -- Description: Pattern matching is utilized for each call of {{Concat.eval}}, {{Reverse.nullSafeEval}} and {{ElementAt.nullSafeEval}} and thus for each record of a dataset. This could lead to a performance penalty when expression are executed in the interpreted mode. The goal of this ticket is to reduce usage of pattern matching within the mentioned "eval" methods. > Eval methods of Concat, Reverse and ElementAt should use pattern matching > only once > --- > > Key: SPARK-25469 > URL: https://issues.apache.org/jira/browse/SPARK-25469 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0, 3.0.0 >Reporter: Marek Novotny >Priority: Major > > Pattern matching is utilized for each call of {{Concat.eval}}, > {{Reverse.nullSafeEval}} and {{ElementAt.nullSafeEval}} and thus for each > record of a dataset. This could lead to a performance penalty when expression > are executed in the interpreted mode. The goal of this ticket is to reduce > usage of pattern matching within the mentioned "eval" methods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25469) Eval methods of Concat, Reverse and ElementAt should use pattern matching only once
[ https://issues.apache.org/jira/browse/SPARK-25469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marek Novotny updated SPARK-25469: -- Summary: Eval methods of Concat, Reverse and ElementAt should use pattern matching only once (was: Eval method of Concat expression should call pattern matching only once) > Eval methods of Concat, Reverse and ElementAt should use pattern matching > only once > --- > > Key: SPARK-25469 > URL: https://issues.apache.org/jira/browse/SPARK-25469 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0, 3.0.0 >Reporter: Marek Novotny >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25469) Eval method of Concat expression should call pattern matching only once
Marek Novotny created SPARK-25469: - Summary: Eval method of Concat expression should call pattern matching only once Key: SPARK-25469 URL: https://issues.apache.org/jira/browse/SPARK-25469 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0, 3.0.0 Reporter: Marek Novotny -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25470) Eval method of Concat expression should call pattern matching only once
Marek Novotny created SPARK-25470: - Summary: Eval method of Concat expression should call pattern matching only once Key: SPARK-25470 URL: https://issues.apache.org/jira/browse/SPARK-25470 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0, 3.0.0 Reporter: Marek Novotny -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25122) Deduplication of supports equals code
[ https://issues.apache.org/jira/browse/SPARK-25122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marek Novotny updated SPARK-25122: -- Description: The method {{*supportEquals}} determining whether elements of a data type could be used as items in a hash set or as keys in a hash map is duplicated across multiple collection and higher-order functions. The goal of this ticket is to deduplicate the method. was: The method \{{*supportEquals} determining whether elements of a data type could be used as items in a hash set or as keys in a hash map is duplicated across multiple collection and higher-order functions. The goal of this ticket is to deduplicate the method. > Deduplication of supports equals code > - > > Key: SPARK-25122 > URL: https://issues.apache.org/jira/browse/SPARK-25122 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Marek Novotny >Priority: Trivial > > The method {{*supportEquals}} determining whether elements of a data type > could be used as items in a hash set or as keys in a hash map is duplicated > across multiple collection and higher-order functions. > The goal of this ticket is to deduplicate the method. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25122) Deduplication of supports equals code
Marek Novotny created SPARK-25122: - Summary: Deduplication of supports equals code Key: SPARK-25122 URL: https://issues.apache.org/jira/browse/SPARK-25122 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0 Reporter: Marek Novotny The method \{{*supportEquals} determining whether elements of a data type could be used as items in a hash set or as keys in a hash map is duplicated across multiple collection and higher-order functions. The goal of this ticket is to deduplicate the method. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25068) High-order function: exists(array, function) → boolean
[ https://issues.apache.org/jira/browse/SPARK-25068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579386#comment-16579386 ] Marek Novotny commented on SPARK-25068: --- That's a good point. Thanks for your answer! > High-order function: exists(array, function) → boolean > - > > Key: SPARK-25068 > URL: https://issues.apache.org/jira/browse/SPARK-25068 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.4.0 >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin >Priority: Major > Fix For: 2.4.0 > > > Tests if arrays have those elements for which function returns true. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25068) High-order function: exists(array, function) → boolean
[ https://issues.apache.org/jira/browse/SPARK-25068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578087#comment-16578087 ] Marek Novotny commented on SPARK-25068: --- I am curious about the motivation for adding this function. :) Was it added just to get Spark aligned with some other SQL engine or standard? Or is the motivation to offer users a function with better performance in {{true}} cases compare to {{aggregate}}? If so, what about adding a function like {{forAll}} in Scala that would stop iterating when the lambda returns {{false}}? > High-order function: exists(array, function) → boolean > - > > Key: SPARK-25068 > URL: https://issues.apache.org/jira/browse/SPARK-25068 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.4.0 >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin >Priority: Major > Fix For: 2.4.0 > > > Tests if arrays have those elements for which function returns true. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-24042) High-order function: zip_with_index
[ https://issues.apache.org/jira/browse/SPARK-24042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marek Novotny resolved SPARK-24042. --- Resolution: Won't Fix > High-order function: zip_with_index > --- > > Key: SPARK-24042 > URL: https://issues.apache.org/jira/browse/SPARK-24042 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.4.0 >Reporter: Marek Novotny >Priority: Major > > Implement function {{zip_with_index(array[, indexFirst])}} that transforms > the input array by encapsulating elements into pairs with indexes indicating > the order. > Examples: > {{zip_with_index(array("d", "a", null, "b")) => > [("d",0),("a",1),(null,2),("b",3)]}} > {{zip_with_index(array("d", "a", null, "b"), true) => > [(0,"d"),(1,"a"),(2,null),(3,"b")]}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23938) High-order function: map_zip_with(map, map, function) → map
[ https://issues.apache.org/jira/browse/SPARK-23938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16570148#comment-16570148 ] Marek Novotny commented on SPARK-23938: --- I'm working on this. Thanks. > High-order function: map_zip_with(map, map, function V3>) → map > --- > > Key: SPARK-23938 > URL: https://issues.apache.org/jira/browse/SPARK-23938 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Xiao Li >Priority: Major > > Ref: https://prestodb.io/docs/current/functions/map.html > Merges the two given maps into a single map by applying function to the pair > of values with the same key. For keys only presented in one map, NULL will be > passed as the value for the missing key. > {noformat} > SELECT map_zip_with(MAP(ARRAY[1, 2, 3], ARRAY['a', 'b', 'c']), -- {1 -> ad, 2 > -> be, 3 -> cf} > MAP(ARRAY[1, 2, 3], ARRAY['d', 'e', 'f']), > (k, v1, v2) -> concat(v1, v2)); > SELECT map_zip_with(MAP(ARRAY['k1', 'k2'], ARRAY[1, 2]), -- {k1 -> ROW(1, > null), k2 -> ROW(2, 4), k3 -> ROW(null, 9)} > MAP(ARRAY['k2', 'k3'], ARRAY[4, 9]), > (k, v1, v2) -> (v1, v2)); > SELECT map_zip_with(MAP(ARRAY['a', 'b', 'c'], ARRAY[1, 8, 27]), -- {a -> a1, > b -> b4, c -> c9} > MAP(ARRAY['a', 'b', 'c'], ARRAY[1, 2, 3]), > (k, v1, v2) -> k || CAST(v1/v2 AS VARCHAR)); > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23901) Data Masking Functions
[ https://issues.apache.org/jira/browse/SPARK-23901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16545443#comment-16545443 ] Marek Novotny commented on SPARK-23901: --- Is there a consensus on getting the masking functions to version 2.4.0? If so, what about extending also Python and R API to be consistent? (I volunteer for that.) > Data Masking Functions > -- > > Key: SPARK-23901 > URL: https://issues.apache.org/jira/browse/SPARK-23901 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Xiao Li >Assignee: Marco Gaido >Priority: Major > Fix For: 2.4.0 > > > - mask() > - mask_first_n() > - mask_last_n() > - mask_hash() > - mask_show_first_n() > - mask_show_last_n() > Reference: > [1] > [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DataMaskingFunctions] > [2] https://issues.apache.org/jira/browse/HIVE-13568 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24741) Have a built-in AVRO data source implementation
[ https://issues.apache.org/jira/browse/SPARK-24741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16543367#comment-16543367 ] Marek Novotny commented on SPARK-24741: --- Hi [~smilegator], The main motivation for creation of ABRiS was to avoid the same frequent logic in every project that (de)serializes AVRO data read from Kafka via Structured Streaming. We highly appreciate the intention to introduce the SQL functions from_avro and to_avro that will play that role. [~felipesmmelo] and I have checked the Spark-Avro library and seen that it's heavily file-oriented and it will require some adaptations to be reusable by internals of the newly proposed functions. We are keen to participate and promote those adaptations, using our findings from ABRiS and feedbacks from users. RE: Confluent schema registry - Managing schema dependencies among several producers without that could be a nightmare. But no problem, the connectivity to it could be solved by an external library. :) > Have a built-in AVRO data source implementation > --- > > Key: SPARK-24741 > URL: https://issues.apache.org/jira/browse/SPARK-24741 > Project: Spark > Issue Type: New Feature > Components: SQL, Structured Streaming >Affects Versions: 2.4.0 >Reporter: Xiao Li >Assignee: Gengliang Wang >Priority: Major > > Apache Avro (https://avro.apache.org) is a data serialization format. It is > widely used in the Spark and Hadoop ecosystem, especially for Kafka-based > data pipelines. Using Spark-Avro package > (https://github.com/databricks/spark-avro), Spark SQL can read and write the > avro data. Making spark-Avro built-in can provide a better experience for > first-time users of Spark SQL and structured streaming. We expect the > built-in Avro data source can further improve the adoption of structured > streaming. We should consider inlining > https://github.com/databricks/spark-avro. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24741) Have a built-in AVRO data source implementation
[ https://issues.apache.org/jira/browse/SPARK-24741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16536907#comment-16536907 ] Marek Novotny commented on SPARK-24741: --- It would be nice if the build-in data source was interoperable with Structured Streaming and schema registries. (Maybe like [ABRiS|https://github.com/AbsaOSS/ABRiS] does?) cc [~felipesmmelo] > Have a built-in AVRO data source implementation > --- > > Key: SPARK-24741 > URL: https://issues.apache.org/jira/browse/SPARK-24741 > Project: Spark > Issue Type: New Feature > Components: SQL, Structured Streaming >Affects Versions: 2.4.0 >Reporter: Xiao Li >Priority: Major > > Apache Avro (https://avro.apache.org) is a data serialization format. It is > widely used in the Spark and Hadoop ecosystem, especially for Kafka-based > data pipelines. Using Spark-Avro package > (https://github.com/databricks/spark-avro), Spark SQL can read and write the > avro data. Making spark-Avro built-in can provide a better experience for > first-time users of Spark SQL and structured streaming. We expect the > built-in Avro data source can further improve the adoption of structured > streaming. We should consider inlining > https://github.com/databricks/spark-avro. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24165) UDF within when().otherwise() raises NullPointerException
[ https://issues.apache.org/jira/browse/SPARK-24165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527734#comment-16527734 ] Marek Novotny commented on SPARK-24165: --- [~jingxuan001] Please could you provide a short example demonstrating the bug? I would like to double check that my patch fixes exactly your problam. Thanks. > UDF within when().otherwise() raises NullPointerException > - > > Key: SPARK-24165 > URL: https://issues.apache.org/jira/browse/SPARK-24165 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Jingxuan Wang >Priority: Major > > I have a UDF which takes java.sql.Timestamp and String as input column type > and returns an Array of (Seq[case class], Double) as output. Since some of > values in input columns can be nullable, I put the UDF inside a > when($input.isNull, null).otherwise(UDF) filter. Such function works well > when I test in spark shell. But running as a scala jar in spark-submit with > yarn cluster mode, it raised NullPointerException which points to the UDF > function. If I remove the when().otherwsie() condition, but put null check > inside the UDF, the function works without issue in spark-submit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-24165) UDF within when().otherwise() raises NullPointerException
[ https://issues.apache.org/jira/browse/SPARK-24165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527556#comment-16527556 ] Marek Novotny edited comment on SPARK-24165 at 6/29/18 12:33 PM: - It seems that Spark is not able resolve nullability for nested types correctly. {{val rows = new util.ArrayList[Row]()}} {{rows.add(Row(true, ("1", 1)))}} {{rows.add(Row(false, (null, 2)))}} {{val schema = StructType(Seq(}} {{StructField("cond", BooleanType, false),}} {{StructField("s", StructType(Seq(}} {{StructField("val1", StringType, true),}} {{StructField("val2", IntegerType, false)}} {{)))}} {{))}} {{val df = spark.createDataFrame(rows, schema)}} {{df.select(when('cond, expr("struct('x' as val1, 10 as val2)")).otherwise('s) as "result").printSchema()}} Result: {{root}} {{|-- result: struct (nullable = true)}} {{| |-- val1: string (nullable = *{color:#ff}false{color}*)}} {{| |-- val2: integer (nullable = false)}} I will take a look at the problem. was (Author: mn-mikke): It seems that Spark is not able resolve nullability for nested types correctly. {{val rows = new util.ArrayList[Row]()}} {{rows.add(Row(true, ("1", 1)))}} {{rows.add(Row(false, (null, 2)))}} {{val schema = StructType(Seq(}} {{ StructField("cond", BooleanType, false),}} {{ StructField("s", StructType(Seq(}} {{ StructField("val1", StringType, true),}} {{ StructField("val2", IntegerType, false)}} {{ )))}} {{))}} {{val df = spark.createDataFrame(rows, schema)}} {{df.select(when('cond, expr("struct('x' as val1, 10 as val2)")).otherwise('s) as "result").printSchema()}} Result: {{root}} {{ |-- result: struct (nullable = true)}} {{ | |-- val1: string (nullable = *{color:#FF}false{color}*)}} {{ | |-- val2: integer (nullable = false)}} I will take a look at the problem. > UDF within when().otherwise() raises NullPointerException > - > > Key: SPARK-24165 > URL: https://issues.apache.org/jira/browse/SPARK-24165 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Jingxuan Wang >Priority: Major > > I have a UDF which takes java.sql.Timestamp and String as input column type > and returns an Array of (Seq[case class], Double) as output. Since some of > values in input columns can be nullable, I put the UDF inside a > when($input.isNull, null).otherwise(UDF) filter. Such function works well > when I test in spark shell. But running as a scala jar in spark-submit with > yarn cluster mode, it raised NullPointerException which points to the UDF > function. If I remove the when().otherwsie() condition, but put null check > inside the UDF, the function works without issue in spark-submit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24165) UDF within when().otherwise() raises NullPointerException
[ https://issues.apache.org/jira/browse/SPARK-24165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527556#comment-16527556 ] Marek Novotny commented on SPARK-24165: --- It seems that Spark is not able resolve nullability for nested types correctly. {{val rows = new util.ArrayList[Row]()}} {{rows.add(Row(true, ("1", 1)))}} {{rows.add(Row(false, (null, 2)))}} {{val schema = StructType(Seq(}} {{ StructField("cond", BooleanType, false),}} {{ StructField("s", StructType(Seq(}} {{ StructField("val1", StringType, true),}} {{ StructField("val2", IntegerType, false)}} {{ )))}} {{))}} {{val df = spark.createDataFrame(rows, schema)}} {{df.select(when('cond, expr("struct('x' as val1, 10 as val2)")).otherwise('s) as "result").printSchema()}} Result: {{root}} {{ |-- result: struct (nullable = true)}} {{ | |-- val1: string (nullable = *{color:#FF}false{color}*)}} {{ | |-- val2: integer (nullable = false)}} I will take a look at the problem. > UDF within when().otherwise() raises NullPointerException > - > > Key: SPARK-24165 > URL: https://issues.apache.org/jira/browse/SPARK-24165 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Jingxuan Wang >Priority: Major > > I have a UDF which takes java.sql.Timestamp and String as input column type > and returns an Array of (Seq[case class], Double) as output. Since some of > values in input columns can be nullable, I put the UDF inside a > when($input.isNull, null).otherwise(UDF) filter. Such function works well > when I test in spark shell. But running as a scala jar in spark-submit with > yarn cluster mode, it raised NullPointerException which points to the UDF > function. If I remove the when().otherwsie() condition, but put null check > inside the UDF, the function works without issue in spark-submit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24636) Type Coercion of Arrays for array_join Function
[ https://issues.apache.org/jira/browse/SPARK-24636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marek Novotny updated SPARK-24636: -- Description: The description of SPARK-23916 refers to the implementation of {{array_join}} function in Presto. This implementation accepts arbitrary arrays of primitive types as an input.: {{presto> SELECT array_join(ARRAY [1, 2, 3], ', ');}} {{_col0}} {{-}} {{1, 2, 3}} {{(1 row)}} The goal of this Jira ticket is to change the current implementation of {{array_join}} function to support other types of arrays. was: The description of SPARK-23916 refers to the implementation of {{array_join}} function in Presto. This implementation accepts an arbitrary arrays of primitive types.: {{presto> SELECT array_join(ARRAY [1, 2, 3], ', ');}} {{_col0}} {{-}} {{1, 2, 3}} {{(1 row)}} The goal of this Jira ticket is to change the current implementation of array_join function to support other types of arrays. > Type Coercion of Arrays for array_join Function > --- > > Key: SPARK-24636 > URL: https://issues.apache.org/jira/browse/SPARK-24636 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Marek Novotny >Priority: Major > > The description of SPARK-23916 refers to the implementation of {{array_join}} > function in Presto. This implementation accepts arbitrary arrays of primitive > types as an input.: > {{presto> SELECT array_join(ARRAY [1, 2, 3], ', ');}} > {{_col0}} > {{-}} > {{1, 2, 3}} > {{(1 row)}} > The goal of this Jira ticket is to change the current implementation of > {{array_join}} function to support other types of arrays. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24636) Type Coercion of Arrays for array_join Function
Marek Novotny created SPARK-24636: - Summary: Type Coercion of Arrays for array_join Function Key: SPARK-24636 URL: https://issues.apache.org/jira/browse/SPARK-24636 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0 Reporter: Marek Novotny The description of SPARK-23916 refers to the implementation of {{array_join}} function in Presto. This implementation accepts an arbitrary arrays of primitive types.: {{presto> SELECT array_join(ARRAY [1, 2, 3], ', ');}} {{_col0}} {{-}} {{1, 2, 3}} {{(1 row)}} The goal of this Jira ticket is to change the current implementation of array_join function to support other types of arrays. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24331) Add arrays_overlap / array_repeat / map_entries
[ https://issues.apache.org/jira/browse/SPARK-24331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marek Novotny updated SPARK-24331: -- Description: Add SparkR equivalent to: * arrays_overlap - SPARK-23922 * array_repeat - SPARK-23925 * map_entries - SPARK-23935 was: Add SparkR equivalent to: * cardinality - SPARK-23923 * arrays_overlap - SPARK-23922 * array_repeat - SPARK-23925 * map_entries - SPARK-23935 > Add arrays_overlap / array_repeat / map_entries > - > > Key: SPARK-24331 > URL: https://issues.apache.org/jira/browse/SPARK-24331 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Affects Versions: 2.4.0 >Reporter: Marek Novotny >Priority: Major > > Add SparkR equivalent to: > * arrays_overlap - SPARK-23922 > * array_repeat - SPARK-23925 > * map_entries - SPARK-23935 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24331) Add arrays_overlap / array_repeat / map_entries
[ https://issues.apache.org/jira/browse/SPARK-24331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marek Novotny updated SPARK-24331: -- Summary: Add arrays_overlap / array_repeat / map_entries(was: Add cardinality / arrays_overlap / array_repeat / map_entries ) > Add arrays_overlap / array_repeat / map_entries > - > > Key: SPARK-24331 > URL: https://issues.apache.org/jira/browse/SPARK-24331 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Affects Versions: 2.4.0 >Reporter: Marek Novotny >Priority: Major > > Add SparkR equivalent to: > * cardinality - SPARK-23923 > * arrays_overlap - SPARK-23922 > * array_repeat - SPARK-23925 > * map_entries - SPARK-23935 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24331) Add cardinality / arrays_overlap / array_repeat / map_entries
[ https://issues.apache.org/jira/browse/SPARK-24331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16482577#comment-16482577 ] Marek Novotny commented on SPARK-24331: --- I will work on this. Thanks! > Add cardinality / arrays_overlap / array_repeat / map_entries > --- > > Key: SPARK-24331 > URL: https://issues.apache.org/jira/browse/SPARK-24331 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Affects Versions: 2.4.0 >Reporter: Marek Novotny >Priority: Major > > Add SparkR equivalent to: > * cardinality - SPARK-23923 > * arrays_overlap - SPARK-23922 > * array_repeat - SPARK-23925 > * map_entries - SPARK-23935 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24331) Add cardinality / arrays_overlap / array_repeat / map_entries
Marek Novotny created SPARK-24331: - Summary: Add cardinality / arrays_overlap / array_repeat / map_entries Key: SPARK-24331 URL: https://issues.apache.org/jira/browse/SPARK-24331 Project: Spark Issue Type: Sub-task Components: SparkR Affects Versions: 2.4.0 Reporter: Marek Novotny Add SparkR equivalent to: * cardinality - SPARK-23923 * arrays_overlap - SPARK-23922 * array_repeat - SPARK-23925 * map_entries - SPARK-23935 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24305) Avoid serialization of private fields in new collection expressions
Marek Novotny created SPARK-24305: - Summary: Avoid serialization of private fields in new collection expressions Key: SPARK-24305 URL: https://issues.apache.org/jira/browse/SPARK-24305 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0 Reporter: Marek Novotny Make sure that private fields of expression case classes in _collectionOperations.scala_ are not serialized. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23928) High-order function: shuffle(x) → array
[ https://issues.apache.org/jira/browse/SPARK-23928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16476108#comment-16476108 ] Marek Novotny commented on SPARK-23928: --- [~hzlu] Cool. (Re: Testing) Sounds good to me. It would be difficult to try to test the randomness. Instead of that, I would refer to some well-known algorithm (e.g. Fisher–Yates). > High-order function: shuffle(x) → array > --- > > Key: SPARK-23928 > URL: https://issues.apache.org/jira/browse/SPARK-23928 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Xiao Li >Priority: Major > > Ref: https://prestodb.io/docs/current/functions/array.html > Generate a random permutation of the given array x. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23928) High-order function: shuffle(x) → array
[ https://issues.apache.org/jira/browse/SPARK-23928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16475451#comment-16475451 ] Marek Novotny commented on SPARK-23928: --- [~hzlu] Any joy? I can take this one or help. > High-order function: shuffle(x) → array > --- > > Key: SPARK-23928 > URL: https://issues.apache.org/jira/browse/SPARK-23928 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Xiao Li >Priority: Major > > Ref: https://prestodb.io/docs/current/functions/array.html > Generate a random permutation of the given array x. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23931) High-order function: zip(array1, array2[, ...]) → array
[ https://issues.apache.org/jira/browse/SPARK-23931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16472169#comment-16472169 ] Marek Novotny commented on SPARK-23931: --- Ok. Good luck! > High-order function: zip(array1, array2[, ...]) → array > > > Key: SPARK-23931 > URL: https://issues.apache.org/jira/browse/SPARK-23931 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Xiao Li >Priority: Major > > Ref: https://prestodb.io/docs/current/functions/array.html > Merges the given arrays, element-wise, into a single array of rows. The M-th > element of the N-th argument will be the N-th field of the M-th output > element. If the arguments have an uneven length, missing values are filled > with NULL. > {noformat} > SELECT zip(ARRAY[1, 2], ARRAY['1b', null, '3b']); -- [ROW(1, '1b'), ROW(2, > null), ROW(null, '3b')] > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23931) High-order function: zip(array1, array2[, ...]) → array
[ https://issues.apache.org/jira/browse/SPARK-23931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16472157#comment-16472157 ] Marek Novotny commented on SPARK-23931: --- [~DylanGuedes] Any joy? I can take this one if you want. > High-order function: zip(array1, array2[, ...]) → array > > > Key: SPARK-23931 > URL: https://issues.apache.org/jira/browse/SPARK-23931 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Xiao Li >Priority: Major > > Ref: https://prestodb.io/docs/current/functions/array.html > Merges the given arrays, element-wise, into a single array of rows. The M-th > element of the N-th argument will be the N-th field of the M-th output > element. If the arguments have an uneven length, missing values are filled > with NULL. > {noformat} > SELECT zip(ARRAY[1, 2], ARRAY['1b', null, '3b']); -- [ROW(1, '1b'), ROW(2, > null), ROW(null, '3b')] > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24197) add array_sort function
[ https://issues.apache.org/jira/browse/SPARK-24197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marek Novotny updated SPARK-24197: -- Description: Add a SparkR equivalent function to [SPARK-23921|https://issues.apache.org/jira/browse/SPARK-23921]. (was: Add a SparkR equivalent function to SPARK-23921.) > add array_sort function > --- > > Key: SPARK-24197 > URL: https://issues.apache.org/jira/browse/SPARK-24197 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Affects Versions: 2.4.0 >Reporter: Marek Novotny >Priority: Major > > Add a SparkR equivalent function to > [SPARK-23921|https://issues.apache.org/jira/browse/SPARK-23921]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24197) add array_sort function
[ https://issues.apache.org/jira/browse/SPARK-24197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marek Novotny updated SPARK-24197: -- Description: Add a SparkR equivalent function to SPARK-23921. (was: Add a SparkR equivalent function for [SPARK-23921|https://issues.apache.org/jira/browse/SPARK-23921].) > add array_sort function > --- > > Key: SPARK-24197 > URL: https://issues.apache.org/jira/browse/SPARK-24197 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Affects Versions: 2.4.0 >Reporter: Marek Novotny >Priority: Major > > Add a SparkR equivalent function to SPARK-23921. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24198) add slice function
[ https://issues.apache.org/jira/browse/SPARK-24198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465627#comment-16465627 ] Marek Novotny commented on SPARK-24198: --- I will work on this. Thanks. > add slice function > -- > > Key: SPARK-24198 > URL: https://issues.apache.org/jira/browse/SPARK-24198 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Affects Versions: 2.4.0 >Reporter: Marek Novotny >Priority: Major > > Add a SparkR equivalent function to > [SPARK-23930|https://issues.apache.org/jira/browse/SPARK-23930]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24198) add slice function
Marek Novotny created SPARK-24198: - Summary: add slice function Key: SPARK-24198 URL: https://issues.apache.org/jira/browse/SPARK-24198 Project: Spark Issue Type: Sub-task Components: SparkR Affects Versions: 2.4.0 Reporter: Marek Novotny Add a SparkR equivalent function to [SPARK-23930|https://issues.apache.org/jira/browse/SPARK-23930]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24197) add array_sort function
[ https://issues.apache.org/jira/browse/SPARK-24197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465626#comment-16465626 ] Marek Novotny commented on SPARK-24197: --- I will work on this. Thanks. > add array_sort function > --- > > Key: SPARK-24197 > URL: https://issues.apache.org/jira/browse/SPARK-24197 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Affects Versions: 2.4.0 >Reporter: Marek Novotny >Priority: Major > > Add a SparkR equivalent function for > [SPARK-23921|https://issues.apache.org/jira/browse/SPARK-23921]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24197) add array_sort function
Marek Novotny created SPARK-24197: - Summary: add array_sort function Key: SPARK-24197 URL: https://issues.apache.org/jira/browse/SPARK-24197 Project: Spark Issue Type: Sub-task Components: SparkR Affects Versions: 2.4.0 Reporter: Marek Novotny Add a SparkR equivalent function for [SPARK-23921|https://issues.apache.org/jira/browse/SPARK-23921]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24148) Adding Ability to Specify SQL Type of Empty Arrays
Marek Novotny created SPARK-24148: - Summary: Adding Ability to Specify SQL Type of Empty Arrays Key: SPARK-24148 URL: https://issues.apache.org/jira/browse/SPARK-24148 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0 Reporter: Marek Novotny The function array(...) doesn't let user specify the element type and uses string as default. This is a shortcoming in the API as the developer either needs a scala case class and let catalyst infer the schema or by using the Java UDF API. This Jira suggests ehnancement of the array function, which will produce an empty array of a given element type. A perfect example of the use case is {{when(cond, trueExp).otherwise(falseExp)}}, which expects {{trueExp}} and {{falseExp}} of being the same type. In scenario where we want to produce an empty array, in one of these cases, there's no other way than creating an UDF. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23935) High-order function: map_entries(map<K, V>) → array<row<K,V>>
[ https://issues.apache.org/jira/browse/SPARK-23935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456274#comment-16456274 ] Marek Novotny commented on SPARK-23935: --- I will work on this one. Thanks. > High-order function: map_entries(map) → array > > - > > Key: SPARK-23935 > URL: https://issues.apache.org/jira/browse/SPARK-23935 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Xiao Li >Priority: Major > > Ref: https://prestodb.io/docs/current/functions/map.html > Returns an array of all entries in the given map. > {noformat} > SELECT map_entries(MAP(ARRAY[1, 2], ARRAY['x', 'y'])); -- [ROW(1, 'x'), > ROW(2, 'y')] > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23934) High-order function: map_from_entries(array<row<K, V>>) → map<K,V>
[ https://issues.apache.org/jira/browse/SPARK-23934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456272#comment-16456272 ] Marek Novotny commented on SPARK-23934: --- I will work on this one. Thanks. > High-order function: map_from_entries(array>) → map
> -- > > Key: SPARK-23934 > URL: https://issues.apache.org/jira/browse/SPARK-23934 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Xiao Li >Priority: Major > > Ref: https://prestodb.io/docs/current/functions/map.html > Returns a map created from the given array of entries. > {noformat} > SELECT map_from_entries(ARRAY[(1, 'x'), (2, 'y')]); -- {1 -> 'x', 2 -> 'y'} > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-23925) High-order function: repeat(element, count) → array
[ https://issues.apache.org/jira/browse/SPARK-23925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16454526#comment-16454526 ] Marek Novotny edited comment on SPARK-23925 at 4/26/18 4:47 PM: [~pepinoflo] Any joy? I can take this one or help if you want? was (Author: mn-mikke): @pepinoflo Any joy? I can take this one or help if you want? > High-order function: repeat(element, count) → array > --- > > Key: SPARK-23925 > URL: https://issues.apache.org/jira/browse/SPARK-23925 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Xiao Li >Priority: Major > > Ref: https://prestodb.io/docs/current/functions/array.html > Repeat element for count times. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23925) High-order function: repeat(element, count) → array
[ https://issues.apache.org/jira/browse/SPARK-23925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16454526#comment-16454526 ] Marek Novotny commented on SPARK-23925: --- @pepinoflo Any joy? I can take this one or help if you want? > High-order function: repeat(element, count) → array > --- > > Key: SPARK-23925 > URL: https://issues.apache.org/jira/browse/SPARK-23925 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Xiao Li >Priority: Major > > Ref: https://prestodb.io/docs/current/functions/array.html > Repeat element for count times. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24042) High-order function: zip_with_index
Marek Novotny created SPARK-24042: - Summary: High-order function: zip_with_index Key: SPARK-24042 URL: https://issues.apache.org/jira/browse/SPARK-24042 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 2.4.0 Reporter: Marek Novotny Implement function {{zip_with_index(array[, indexFirst])}} that transforms the input array by encapsulating elements into pairs with indexes indicating the order. Examples: {{zip_with_index(array("d", "a", null, "b")) => [("d",0),("a",1),(null,2),("b",3)]}} {{zip_with_index(array("d", "a", null, "b"), true) => [(0,"d"),(1,"a"),(2,null),(3,"b")]}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23936) High-order function: map_concat(map1<K, V>, map2<K, V>, ..., mapN<K, V>) → map<K,V>
[ https://issues.apache.org/jira/browse/SPARK-23936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434185#comment-16434185 ] Marek Novotny commented on SPARK-23936: --- Shouldn't we overload _concat_ function for maps instead of introducing _map_concat_? > High-order function: map_concat(map1, map2 , ..., mapN ) → > map > --- > > Key: SPARK-23936 > URL: https://issues.apache.org/jira/browse/SPARK-23936 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Xiao Li >Priority: Major > > Ref: https://prestodb.io/docs/current/functions/map.html > Returns the union of all the given maps. If a key is found in multiple given > maps, that key’s value in the resulting map comes from the last one of those > maps. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23926) High-order function: reverse(x) → array
[ https://issues.apache.org/jira/browse/SPARK-23926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430539#comment-16430539 ] Marek Novotny commented on SPARK-23926: --- I take this one. > High-order function: reverse(x) → array > --- > > Key: SPARK-23926 > URL: https://issues.apache.org/jira/browse/SPARK-23926 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Xiao Li >Priority: Major > > Ref: https://prestodb.io/docs/current/functions/array.html > Returns an array which has the reversed order of array x. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23821) Collection function: flatten
[ https://issues.apache.org/jira/browse/SPARK-23821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marek Novotny updated SPARK-23821: -- Summary: Collection function: flatten (was: Collection functions: flatten) > Collection function: flatten > > > Key: SPARK-23821 > URL: https://issues.apache.org/jira/browse/SPARK-23821 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.4.0 >Reporter: Marek Novotny >Priority: Major > > Add the flatten function that transforms an Array of Arrays column into an > Array elements column. if the array structure contains more than two levels > of nesting, the function removes one nesting level > Example: > {{flatten(array(array(1, 2, 3), array(3, 4, 5), array(6, 7, 8)) => > [1,2,3,4,5,6,7,8,9]}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23821) Collection functions: flatten
Marek Novotny created SPARK-23821: - Summary: Collection functions: flatten Key: SPARK-23821 URL: https://issues.apache.org/jira/browse/SPARK-23821 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 2.4.0 Reporter: Marek Novotny Add the flatten function that transforms an Array of Arrays column into an Array elements column. if the array structure contains more than two levels of nesting, the function removes one nesting level Example: {{flatten(array(array(1, 2, 3), array(3, 4, 5), array(6, 7, 8)) => [1,2,3,4,5,6,7,8,9]}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23798) The CreateArray and ConcatArray should return the default array type when no children provided
Marek Novotny created SPARK-23798: - Summary: The CreateArray and ConcatArray should return the default array type when no children provided Key: SPARK-23798 URL: https://issues.apache.org/jira/browse/SPARK-23798 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0 Reporter: Marek Novotny The expressions should return ArrayType.defaultConcreteType. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23504) Flaky test: RateSourceV2Suite.basic microbatch execution
[ https://issues.apache.org/jira/browse/SPARK-23504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16415313#comment-16415313 ] Marek Novotny commented on SPARK-23504: --- Experienced the [same problem|https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88605/testReport/org.apache.spark.sql.execution.streaming/RateSourceV2Suite/basic_microbatch_execution/]. > Flaky test: RateSourceV2Suite.basic microbatch execution > > > Key: SPARK-23504 > URL: https://issues.apache.org/jira/browse/SPARK-23504 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 2.4.0 >Reporter: Marcelo Vanzin >Priority: Major > > Seen on an unrelated change: > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87635/testReport/org.apache.spark.sql.execution.streaming/RateSourceV2Suite/basic_microbatch_execution/ > {noformat} > Error Message > org.scalatest.exceptions.TestFailedException: == Results == !== Correct > Answer - 10 == == Spark Answer - 0 == !struct<_1:timestamp,_2:int> > struct<> ![1969-12-31 16:00:00.0,0] ![1969-12-31 16:00:00.1,1] > ![1969-12-31 16:00:00.2,2] ![1969-12-31 16:00:00.3,3] ![1969-12-31 > 16:00:00.4,4] ![1969-12-31 16:00:00.5,5] ![1969-12-31 16:00:00.6,6] > ![1969-12-31 16:00:00.7,7] ![1969-12-31 16:00:00.8,8] > ![1969-12-31 16:00:00.9,9]== Progress == > AdvanceRateManualClock(1) => CheckLastBatch: [1969-12-31 > 16:00:00.0,0],[1969-12-31 16:00:00.1,1],[1969-12-31 16:00:00.2,2],[1969-12-31 > 16:00:00.3,3],[1969-12-31 16:00:00.4,4],[1969-12-31 16:00:00.5,5],[1969-12-31 > 16:00:00.6,6],[1969-12-31 16:00:00.7,7],[1969-12-31 16:00:00.8,8],[1969-12-31 > 16:00:00.9,9]StopStream > StartStream(ProcessingTime(0),org.apache.spark.util.SystemClock@22bc97a,Map(),null) > AdvanceRateManualClock(2)CheckLastBatch: [1969-12-31 > 16:00:01.0,10],[1969-12-31 16:00:01.1,11],[1969-12-31 > 16:00:01.2,12],[1969-12-31 16:00:01.3,13],[1969-12-31 > 16:00:01.4,14],[1969-12-31 16:00:01.5,15],[1969-12-31 > 16:00:01.6,16],[1969-12-31 16:00:01.7,17],[1969-12-31 > 16:00:01.8,18],[1969-12-31 16:00:01.9,19] == Stream == Output Mode: Append > Stream state: > {org.apache.spark.sql.execution.streaming.sources.RateStreamMicroBatchReader@75b88292: > {"0":{"value":-1,"runTimeMs":0}}} Thread state: alive Thread stack trace: > sun.misc.Unsafe.park(Native Method) > java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) > > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) > scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202) > scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218) > scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:153) > org.apache.spark.util.ThreadUtils$.awaitReady(ThreadUtils.scala:222) > org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:633) > org.apache.spark.SparkContext.runJob(SparkContext.scala:2030) > org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2.scala:84) > > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) > > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) > > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) > > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) > org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:247) > org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:294) > org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:3272) > org.apache.spark.sql.Dataset$$anonfun$collect$1.apply(Dataset.scala:2722) > org.apache.spark.sql.Dataset$$anonfun$collect$1.apply(Dataset.scala:2722) > org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3253) > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77) > org.apache.spark.sql.Dataset.withAction(Dataset.scala:3252) > org.apache.spark.sql.Dataset.collect(Dataset.scala:2722) >
[jira] [Updated] (SPARK-23736) Extension of the concat function to support array columns
[ https://issues.apache.org/jira/browse/SPARK-23736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marek Novotny updated SPARK-23736: -- Description: Extend the _concat_ function to also support array columns. Example: {{concat(array(1, 2, 3), array(10, 20, 30), array(100, 200)) => [1, 2, 3,10, 20, 30,100, 200] }} was: Extend the _concat_ function to also support array columns. Example: concat(array(1, 2, 3), array(10, 20, 30), array(100, 200)) => [1, 2, 3,10, 20, 30,100, 200] > Extension of the concat function to support array columns > - > > Key: SPARK-23736 > URL: https://issues.apache.org/jira/browse/SPARK-23736 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.4.0 >Reporter: Marek Novotny >Priority: Major > > Extend the _concat_ function to also support array columns. > Example: > {{concat(array(1, 2, 3), array(10, 20, 30), array(100, 200)) => [1, 2, 3,10, > 20, 30,100, 200] }} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23736) Extension of the concat function to support array columns
[ https://issues.apache.org/jira/browse/SPARK-23736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marek Novotny updated SPARK-23736: -- Description: Extend the _concat_ function to also support array columns. Example: concat(array(1, 2, 3), array(10, 20, 30), array(100, 200)) => [1, 2, 3,10, 20, 30,100, 200] was: Implement the _concat_arrays_ function that merges two or more array columns into one. If any of children values is null, the function should return null. {{def concat_arrays(columns : Column*): Column }} Example: [1, 2, 3], [10, 20, 30], [100, 200] => [1, 2, 3,10, 20, 30,100, 200] Summary: Extension of the concat function to support array columns (was: collection function: concat_arrays) > Extension of the concat function to support array columns > - > > Key: SPARK-23736 > URL: https://issues.apache.org/jira/browse/SPARK-23736 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.4.0 >Reporter: Marek Novotny >Priority: Major > > Extend the _concat_ function to also support array columns. > Example: > concat(array(1, 2, 3), array(10, 20, 30), array(100, 200)) => [1, 2, 3,10, > 20, 30,100, 200] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23736) collection function: concat_arrays
[ https://issues.apache.org/jira/browse/SPARK-23736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marek Novotny updated SPARK-23736: -- Issue Type: New Feature (was: Story) > collection function: concat_arrays > -- > > Key: SPARK-23736 > URL: https://issues.apache.org/jira/browse/SPARK-23736 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.4.0 >Reporter: Marek Novotny >Priority: Major > > Implement the _concat_arrays_ function that merges two or more array columns > into one. If any of children values is null, the function should return null. > {{def concat_arrays(columns : Column*): Column }} > Example: > [1, 2, 3], [10, 20, 30], [100, 200] => [1, 2, 3,10, 20, 30,100, 200] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23736) collection function: concat_arrays
Marek Novotny created SPARK-23736: - Summary: collection function: concat_arrays Key: SPARK-23736 URL: https://issues.apache.org/jira/browse/SPARK-23736 Project: Spark Issue Type: Story Components: SQL Affects Versions: 2.4.0 Reporter: Marek Novotny Implement the _concat_arrays_ function that merges two or more array columns into one. If any of children values is null, the function should return null. {{def concat_arrays(columns : Column*): Column }} Example: [1, 2, 3], [10, 20, 30], [100, 200] => [1, 2, 3,10, 20, 30,100, 200] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org