[jira] [Updated] (SPARK-30704) Use jekyll-redirect-from 0.15.0 instead of the latest
[ https://issues.apache.org/jira/browse/SPARK-30704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-30704: -- Description: We use Ruby 2.3 in our release docker image. The latest version of `jekyll-redirect-from 0.16.0` causes a failure on Ruby 2.3. - https://github.com/jekyll/jekyll-redirect-from/releases/tag/v0.16.0 {code} root@dc0bc546e377:/# gem install jekyll-redirect-from ERROR: Error installing jekyll-redirect-from: jekyll-redirect-from requires Ruby version >= 2.4.0. {code} was: The latest version causes a failure on Ruby 2.3. - https://github.com/jekyll/jekyll-redirect-from/releases/tag/v0.16.0 > Use jekyll-redirect-from 0.15.0 instead of the latest > - > > Key: SPARK-30704 > URL: https://issues.apache.org/jira/browse/SPARK-30704 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 2.4.5, 3.0.0 >Reporter: Dongjoon Hyun >Priority: Blocker > > We use Ruby 2.3 in our release docker image. > The latest version of `jekyll-redirect-from 0.16.0` causes a failure on Ruby > 2.3. > - https://github.com/jekyll/jekyll-redirect-from/releases/tag/v0.16.0 > {code} > root@dc0bc546e377:/# gem install jekyll-redirect-from > ERROR: Error installing jekyll-redirect-from: > jekyll-redirect-from requires Ruby version >= 2.4.0. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30704) Use jekyll-redirect-from 0.15.0 instead of the latest
Dongjoon Hyun created SPARK-30704: - Summary: Use jekyll-redirect-from 0.15.0 instead of the latest Key: SPARK-30704 URL: https://issues.apache.org/jira/browse/SPARK-30704 Project: Spark Issue Type: Bug Components: Project Infra Affects Versions: 2.4.5, 3.0.0 Reporter: Dongjoon Hyun The latest version causes a failure on Ruby 2.3. - https://github.com/jekyll/jekyll-redirect-from/releases/tag/v0.16.0 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27686) Update migration guide
[ https://issues.apache.org/jira/browse/SPARK-27686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-27686. --- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 27161 [https://github.com/apache/spark/pull/27161] > Update migration guide > --- > > Key: SPARK-27686 > URL: https://issues.apache.org/jira/browse/SPARK-27686 > Project: Spark > Issue Type: Sub-task > Components: Documentation, SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Minor > Fix For: 3.0.0 > > Attachments: hive-1.2.1-lib.tgz > > > The built-in Hive 2.3 fixes the following issues: > * HIVE-6727: Table level stats for external tables are set incorrectly. > * HIVE-15653: Some ALTER TABLE commands drop table stats. > * SPARK-12014: Spark SQL query containing semicolon is broken in Beeline. > * SPARK-25193: insert overwrite doesn't throw exception when drop old data > fails. > * SPARK-25919: Date value corrupts when tables are "ParquetHiveSerDe" > formatted and target table is Partitioned. > * SPARK-26332: Spark sql write orc table on viewFS throws exception. > * SPARK-26437: Decimal data becomes bigint to query, unable to query. > We need update migration guide. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27686) Update migration guide
[ https://issues.apache.org/jira/browse/SPARK-27686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-27686: - Assignee: Yuming Wang > Update migration guide > --- > > Key: SPARK-27686 > URL: https://issues.apache.org/jira/browse/SPARK-27686 > Project: Spark > Issue Type: Sub-task > Components: Documentation, SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Minor > Attachments: hive-1.2.1-lib.tgz > > > The built-in Hive 2.3 fixes the following issues: > * HIVE-6727: Table level stats for external tables are set incorrectly. > * HIVE-15653: Some ALTER TABLE commands drop table stats. > * SPARK-12014: Spark SQL query containing semicolon is broken in Beeline. > * SPARK-25193: insert overwrite doesn't throw exception when drop old data > fails. > * SPARK-25919: Date value corrupts when tables are "ParquetHiveSerDe" > formatted and target table is Partitioned. > * SPARK-26332: Spark sql write orc table on viewFS throws exception. > * SPARK-26437: Decimal data becomes bigint to query, unable to query. > We need update migration guide. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30703) Add a documentation page for ANSI mode
[ https://issues.apache.org/jira/browse/SPARK-30703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028253#comment-17028253 ] Takeshi Yamamuro commented on SPARK-30703: -- Yea, sure, I will this week ;) > Add a documentation page for ANSI mode > -- > > Key: SPARK-30703 > URL: https://issues.apache.org/jira/browse/SPARK-30703 > Project: Spark > Issue Type: Documentation > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Major > > ANSI mode is introduced in Spark 3.0. We need to clearly document the > behavior difference when spark.sql.ansi.enabled is on and off. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30703) Add a documentation page for ANSI mode
[ https://issues.apache.org/jira/browse/SPARK-30703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028251#comment-17028251 ] Xiao Li commented on SPARK-30703: - [~maropu] Could you help this? > Add a documentation page for ANSI mode > -- > > Key: SPARK-30703 > URL: https://issues.apache.org/jira/browse/SPARK-30703 > Project: Spark > Issue Type: Documentation > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Major > > ANSI mode is introduced in Spark 3.0. We need to clearly document the > behavior difference when spark.sql.ansi.enabled is on and off. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30703) Add a documentation page for ANSI mode
Xiao Li created SPARK-30703: --- Summary: Add a documentation page for ANSI mode Key: SPARK-30703 URL: https://issues.apache.org/jira/browse/SPARK-30703 Project: Spark Issue Type: Documentation Components: SQL Affects Versions: 3.0.0 Reporter: Xiao Li ANSI mode is introduced in Spark 3.0. We need to clearly document the behavior difference when spark.sql.ansi.enabled is on and off. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19842) Informational Referential Integrity Constraints Support in Spark
[ https://issues.apache.org/jira/browse/SPARK-19842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-19842: -- Target Version/s: (was: 3.0.0) > Informational Referential Integrity Constraints Support in Spark > > > Key: SPARK-19842 > URL: https://issues.apache.org/jira/browse/SPARK-19842 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: Ioana Delaney >Priority: Major > Attachments: InformationalRIConstraints.doc > > > *Informational Referential Integrity Constraints Support in Spark* > This work proposes support for _informational primary key_ and _foreign key > (referential integrity) constraints_ in Spark. The main purpose is to open up > an area of query optimization techniques that rely on referential integrity > constraints semantics. > An _informational_ or _statistical constraint_ is a constraint such as a > _unique_, _primary key_, _foreign key_, or _check constraint_, that can be > used by Spark to improve query performance. Informational constraints are not > enforced by the Spark SQL engine; rather, they are used by Catalyst to > optimize the query processing. They provide semantics information that allows > Catalyst to rewrite queries to eliminate joins, push down aggregates, remove > unnecessary Distinct operations, and perform a number of other optimizations. > Informational constraints are primarily targeted to applications that load > and analyze data that originated from a data warehouse. For such > applications, the conditions for a given constraint are known to be true, so > the constraint does not need to be enforced during data load operations. > The attached document covers constraint definition, metastore storage, > constraint validation, and maintenance. The document shows many examples of > query performance improvements that utilize referential integrity constraints > and can be implemented in Spark. > Link to the google doc: > [InformationalRIConstraints|https://docs.google.com/document/d/17r-cOqbKF7Px0xb9L7krKg2-RQB_gD2pxOmklm-ehsw/edit] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19842) Informational Referential Integrity Constraints Support in Spark
[ https://issues.apache.org/jira/browse/SPARK-19842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028234#comment-17028234 ] Dongjoon Hyun commented on SPARK-19842: --- I removed the `Target Version : 3.0.0` because we created `branch-3.0` and entered `Feature Freeze` phase. > Informational Referential Integrity Constraints Support in Spark > > > Key: SPARK-19842 > URL: https://issues.apache.org/jira/browse/SPARK-19842 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: Ioana Delaney >Priority: Major > Attachments: InformationalRIConstraints.doc > > > *Informational Referential Integrity Constraints Support in Spark* > This work proposes support for _informational primary key_ and _foreign key > (referential integrity) constraints_ in Spark. The main purpose is to open up > an area of query optimization techniques that rely on referential integrity > constraints semantics. > An _informational_ or _statistical constraint_ is a constraint such as a > _unique_, _primary key_, _foreign key_, or _check constraint_, that can be > used by Spark to improve query performance. Informational constraints are not > enforced by the Spark SQL engine; rather, they are used by Catalyst to > optimize the query processing. They provide semantics information that allows > Catalyst to rewrite queries to eliminate joins, push down aggregates, remove > unnecessary Distinct operations, and perform a number of other optimizations. > Informational constraints are primarily targeted to applications that load > and analyze data that originated from a data warehouse. For such > applications, the conditions for a given constraint are known to be true, so > the constraint does not need to be enforced during data load operations. > The attached document covers constraint definition, metastore storage, > constraint validation, and maintenance. The document shows many examples of > query performance improvements that utilize referential integrity constraints > and can be implemented in Spark. > Link to the google doc: > [InformationalRIConstraints|https://docs.google.com/document/d/17r-cOqbKF7Px0xb9L7krKg2-RQB_gD2pxOmklm-ehsw/edit] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20964) Make some keywords reserved along with the ANSI/SQL standard
[ https://issues.apache.org/jira/browse/SPARK-20964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-20964: -- Target Version/s: (was: 3.0.0) > Make some keywords reserved along with the ANSI/SQL standard > > > Key: SPARK-20964 > URL: https://issues.apache.org/jira/browse/SPARK-20964 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.1 >Reporter: Takeshi Yamamuro >Priority: Minor > > The current Spark has many non-reserved words that are essentially reserved > in the ANSI/SQL standard > (http://developer.mimer.se/validator/sql-reserved-words.tml). > https://github.com/apache/spark/blob/master/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4#L709 > This is because there are many datasources (for instance twitter4j) that > unfortunately use reserved keywords for column names (See [~hvanhovell]'s > comments: https://github.com/apache/spark/pull/18079#discussion_r118842186). > We might fix this issue in future major releases. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20964) Make some keywords reserved along with the ANSI/SQL standard
[ https://issues.apache.org/jira/browse/SPARK-20964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028233#comment-17028233 ] Dongjoon Hyun commented on SPARK-20964: --- Hi, [~maropu]. I removed the target version first. Please resolve this if this is done in 3.0.0. > Make some keywords reserved along with the ANSI/SQL standard > > > Key: SPARK-20964 > URL: https://issues.apache.org/jira/browse/SPARK-20964 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.1 >Reporter: Takeshi Yamamuro >Priority: Minor > > The current Spark has many non-reserved words that are essentially reserved > in the ANSI/SQL standard > (http://developer.mimer.se/validator/sql-reserved-words.tml). > https://github.com/apache/spark/blob/master/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4#L709 > This is because there are many datasources (for instance twitter4j) that > unfortunately use reserved keywords for column names (See [~hvanhovell]'s > comments: https://github.com/apache/spark/pull/18079#discussion_r118842186). > We might fix this issue in future major releases. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22231) Support of map, filter, withColumn, dropColumn in nested list of structures
[ https://issues.apache.org/jira/browse/SPARK-22231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-22231: -- Target Version/s: 3.1.0 (was: 3.0.0) > Support of map, filter, withColumn, dropColumn in nested list of structures > --- > > Key: SPARK-22231 > URL: https://issues.apache.org/jira/browse/SPARK-22231 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.2.0 >Reporter: DB Tsai >Assignee: Jeremy Smith >Priority: Major > > At Netflix's algorithm team, we work on ranking problems to find the great > content to fulfill the unique tastes of our members. Before building a > recommendation algorithms, we need to prepare the training, testing, and > validation datasets in Apache Spark. Due to the nature of ranking problems, > we have a nested list of items to be ranked in one column, and the top level > is the contexts describing the setting for where a model is to be used (e.g. > profiles, country, time, device, etc.) Here is a blog post describing the > details, [Distributed Time Travel for Feature > Generation|https://medium.com/netflix-techblog/distributed-time-travel-for-feature-generation-389cccdd3907]. > > To be more concrete, for the ranks of videos for a given profile_id at a > given country, our data schema can be looked like this, > {code:java} > root > |-- profile_id: long (nullable = true) > |-- country_iso_code: string (nullable = true) > |-- items: array (nullable = false) > ||-- element: struct (containsNull = false) > |||-- title_id: integer (nullable = true) > |||-- scores: double (nullable = true) > ... > {code} > We oftentimes need to work on the nested list of structs by applying some > functions on them. Sometimes, we're dropping or adding new columns in the > nested list of structs. Currently, there is no easy solution in open source > Apache Spark to perform those operations using SQL primitives; many people > just convert the data into RDD to work on the nested level of data, and then > reconstruct the new dataframe as workaround. This is extremely inefficient > because all the optimizations like predicate pushdown in SQL can not be > performed, we can not leverage on the columnar format, and the serialization > and deserialization cost becomes really huge even we just want to add a new > column in the nested level. > We built a solution internally at Netflix which we're very happy with. We > plan to make it open source in Spark upstream. We would like to socialize the > API design to see if we miss any use-case. > The first API we added is *mapItems* on dataframe which take a function from > *Column* to *Column*, and then apply the function on nested dataframe. Here > is an example, > {code:java} > case class Data(foo: Int, bar: Double, items: Seq[Double]) > val df: Dataset[Data] = spark.createDataset(Seq( > Data(10, 10.0, Seq(10.1, 10.2, 10.3, 10.4)), > Data(20, 20.0, Seq(20.1, 20.2, 20.3, 20.4)) > )) > val result = df.mapItems("items") { > item => item * 2.0 > } > result.printSchema() > // root > // |-- foo: integer (nullable = false) > // |-- bar: double (nullable = false) > // |-- items: array (nullable = true) > // ||-- element: double (containsNull = true) > result.show() > // +---+++ > // |foo| bar| items| > // +---+++ > // | 10|10.0|[20.2, 20.4, 20.6...| > // | 20|20.0|[40.2, 40.4, 40.6...| > // +---+++ > {code} > Now, with the ability of applying a function in the nested dataframe, we can > add a new function, *withColumn* in *Column* to add or replace the existing > column that has the same name in the nested list of struct. Here is two > examples demonstrating the API together with *mapItems*; the first one > replaces the existing column, > {code:java} > case class Item(a: Int, b: Double) > case class Data(foo: Int, bar: Double, items: Seq[Item]) > val df: Dataset[Data] = spark.createDataset(Seq( > Data(10, 10.0, Seq(Item(10, 10.0), Item(11, 11.0))), > Data(20, 20.0, Seq(Item(20, 20.0), Item(21, 21.0))) > )) > val result = df.mapItems("items") { > item => item.withColumn(item("b") + 1 as "b") > } > result.printSchema > root > // |-- foo: integer (nullable = false) > // |-- bar: double (nullable = false) > // |-- items: array (nullable = true) > // ||-- element: struct (containsNull = true) > // |||-- a: integer (nullable = true) > // |||-- b: double (nullable = true) > result.show(false) > // +---++--+ > // |foo|bar |items | > // +---++--+ > // |10 |10.0|[[10,11.0], [11,12.0]]| > // |20 |20.0|[[20,21.0], [21,22.0]]| > //
[jira] [Updated] (SPARK-24625) put all the backward compatible behavior change configs under spark.sql.legacy.*
[ https://issues.apache.org/jira/browse/SPARK-24625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-24625: -- Target Version/s: (was: 3.0.0) > put all the backward compatible behavior change configs under > spark.sql.legacy.* > > > Key: SPARK-24625 > URL: https://issues.apache.org/jira/browse/SPARK-24625 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Wenchen Fan >Priority: Major > > Recently we made several behavior changes to Spark SQL, to make it more ANSI > SQL compliant or fix some unreasonable behaviors. For backward compatibility, > we add configs to allow users fallback to the old behavior and plan to remove > them in Spark 3.0. It's better to put these configs under spark.sql.legacy.* -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24625) put all the backward compatible behavior change configs under spark.sql.legacy.*
[ https://issues.apache.org/jira/browse/SPARK-24625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028232#comment-17028232 ] Dongjoon Hyun commented on SPARK-24625: --- For this, I'll remove the target version first. Please resolve this if we finished this, [~cloud_fan]. > put all the backward compatible behavior change configs under > spark.sql.legacy.* > > > Key: SPARK-24625 > URL: https://issues.apache.org/jira/browse/SPARK-24625 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Wenchen Fan >Priority: Major > > Recently we made several behavior changes to Spark SQL, to make it more ANSI > SQL compliant or fix some unreasonable behaviors. For backward compatibility, > we add configs to allow users fallback to the old behavior and plan to remove > them in Spark 3.0. It's better to put these configs under spark.sql.legacy.* -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24941) Add RDDBarrier.coalesce() function
[ https://issues.apache.org/jira/browse/SPARK-24941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-24941: -- Target Version/s: 3.1.0 (was: 3.0.0) > Add RDDBarrier.coalesce() function > -- > > Key: SPARK-24941 > URL: https://issues.apache.org/jira/browse/SPARK-24941 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Xingbo Jiang >Priority: Major > > https://github.com/apache/spark/pull/21758#discussion_r204917245 > The number of partitions from the input data can be unexpectedly large, eg. > if you do > {code} > sc.textFile(...).barrier().mapPartitions() > {code} > The number of input partitions is based on the hdfs input splits. We shall > provide a way in RDDBarrier to enable users to specify the number of tasks in > a barrier stage. Maybe something like RDDBarrier.coalesce(numPartitions: Int) > . -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25531) new write APIs for data source v2
[ https://issues.apache.org/jira/browse/SPARK-25531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028231#comment-17028231 ] Dongjoon Hyun commented on SPARK-25531: --- I moved this to 3.1.0 for now. If we can resolve this issue, you can change back to 3.0.0. > new write APIs for data source v2 > - > > Key: SPARK-25531 > URL: https://issues.apache.org/jira/browse/SPARK-25531 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Wenchen Fan >Priority: Major > > The current data source write API heavily depend on {{SaveMode}}, which > doesn't have a clear semantic, especially when writing to tables. > We should design a new set of write API without {{SaveMode}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25531) new write APIs for data source v2
[ https://issues.apache.org/jira/browse/SPARK-25531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-25531: -- Target Version/s: 3.1.0 (was: 3.0.0) > new write APIs for data source v2 > - > > Key: SPARK-25531 > URL: https://issues.apache.org/jira/browse/SPARK-25531 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Wenchen Fan >Priority: Major > > The current data source write API heavily depend on {{SaveMode}}, which > doesn't have a clear semantic, especially when writing to tables. > We should design a new set of write API without {{SaveMode}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24942) Improve cluster resource management with jobs containing barrier stage
[ https://issues.apache.org/jira/browse/SPARK-24942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-24942: -- Target Version/s: 3.1.0 (was: 3.0.0) > Improve cluster resource management with jobs containing barrier stage > -- > > Key: SPARK-24942 > URL: https://issues.apache.org/jira/browse/SPARK-24942 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Xingbo Jiang >Priority: Major > > https://github.com/apache/spark/pull/21758#discussion_r205652317 > We shall improve cluster resource management to address the following issues: > - With dynamic resource allocation enabled, it may happen that we acquire > some executors (but not enough to launch all the tasks in a barrier stage) > and later release them due to executor idle time expire, and then acquire > again. > - There can be deadlock with two concurrent applications. Each application > may acquire some resources, but not enough to launch all the tasks in a > barrier stage. And after hitting the idle timeout and releasing them, they > may acquire resources again, but just continually trade resources between > each other. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25383) Image data source supports sample pushdown
[ https://issues.apache.org/jira/browse/SPARK-25383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-25383: -- Target Version/s: 3.1.0 (was: 3.0.0) > Image data source supports sample pushdown > -- > > Key: SPARK-25383 > URL: https://issues.apache.org/jira/browse/SPARK-25383 > Project: Spark > Issue Type: New Feature > Components: ML, SQL >Affects Versions: 3.0.0 >Reporter: Xiangrui Meng >Priority: Major > > After SPARK-25349, we should update image data source to support sampling. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25531) new write APIs for data source v2
[ https://issues.apache.org/jira/browse/SPARK-25531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028230#comment-17028230 ] Dongjoon Hyun commented on SPARK-25531: --- Hi, [~cloud_fan]. Could you resolve this issue or adjust the target version to `3.1.0` ? > new write APIs for data source v2 > - > > Key: SPARK-25531 > URL: https://issues.apache.org/jira/browse/SPARK-25531 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Wenchen Fan >Priority: Major > > The current data source write API heavily depend on {{SaveMode}}, which > doesn't have a clear semantic, especially when writing to tables. > We should design a new set of write API without {{SaveMode}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26425) Add more constraint checks in file streaming source to avoid checkpoint corruption
[ https://issues.apache.org/jira/browse/SPARK-26425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-26425: -- Target Version/s: 3.1.0 (was: 3.0.0) > Add more constraint checks in file streaming source to avoid checkpoint > corruption > -- > > Key: SPARK-26425 > URL: https://issues.apache.org/jira/browse/SPARK-26425 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.0.0 >Reporter: Tathagata Das >Assignee: Tathagata Das >Priority: Major > > Two issues observed in production. > - HDFSMetadataLog.getLatest() tries to read older versions when it is not > able to read the latest listed version file. Not sure why this was done but > this should not be done. If the latest listed file is not readable, then > something is horribly wrong and we should fail rather than report an older > version as that can completely corrupt the checkpoint directory. > - FileStreamSource should check whether adding the a new batch to the > FileStreamSourceLog succeeded or not (similar to how StreamExecution checks > for the OffsetSeqLog) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25752) Add trait to easily whitelist logical operators that produce named output from CleanupAliases
[ https://issues.apache.org/jira/browse/SPARK-25752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-25752: -- Target Version/s: 3.1.0 (was: 3.0.0) > Add trait to easily whitelist logical operators that produce named output > from CleanupAliases > - > > Key: SPARK-25752 > URL: https://issues.apache.org/jira/browse/SPARK-25752 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Tathagata Das >Assignee: Tathagata Das >Priority: Minor > > The rule `CleanupAliases` cleans up aliases from logical operators that do > not match a whitelist. This whitelist is hardcoded inside the rule which is > cumbersome. This PR is to clean that up by making a trait `HasNamedOutput` > that will be ignored by `CleanupAliases` and other ops that require aliases > to be preserved in the operator should extend it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27471) Reorganize public v2 catalog API
[ https://issues.apache.org/jira/browse/SPARK-27471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-27471. --- Assignee: Ryan Blue Resolution: Done > Reorganize public v2 catalog API > > > Key: SPARK-27471 > URL: https://issues.apache.org/jira/browse/SPARK-27471 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Ryan Blue >Assignee: Ryan Blue >Priority: Blocker > > In the review for SPARK-27181, Reynold suggested some package moves. We've > decided (at the v2 community sync) not to delay by having this discussion now > because we want to get the new catalog API in so we can work on more logical > plans in parallel. But we do need to make sure we have a sane package scheme > for the next release. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27471) Reorganize public v2 catalog API
[ https://issues.apache.org/jira/browse/SPARK-27471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028229#comment-17028229 ] Dongjoon Hyun commented on SPARK-27471: --- I believe we've done what we need at 3.0.0. For the rest reorganization, we can file another JIRA. > Reorganize public v2 catalog API > > > Key: SPARK-27471 > URL: https://issues.apache.org/jira/browse/SPARK-27471 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Ryan Blue >Priority: Blocker > > In the review for SPARK-27181, Reynold suggested some package moves. We've > decided (at the v2 community sync) not to delay by having this discussion now > because we want to get the new catalog API in so we can work on more logical > plans in parallel. But we do need to make sure we have a sane package scheme > for the next release. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27780) Shuffle server & client should be versioned to enable smoother upgrade
[ https://issues.apache.org/jira/browse/SPARK-27780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-27780: -- Target Version/s: 3.1.0 (was: 3.0.0) > Shuffle server & client should be versioned to enable smoother upgrade > -- > > Key: SPARK-27780 > URL: https://issues.apache.org/jira/browse/SPARK-27780 > Project: Spark > Issue Type: New Feature > Components: Shuffle, Spark Core >Affects Versions: 3.0.0 >Reporter: Imran Rashid >Priority: Major > > The external shuffle service is often upgraded at a different time than spark > itself. However, this causes problems when the protocol changes between the > shuffle service and the spark runtime -- this forces users to upgrade > everything simultaneously. > We should add versioning to the shuffle client & server, so they know what > messages the other will support. This would allow better handling of mixed > versions, from better error msgs to allowing some mismatched versions (with > reduced capabilities). > This originally came up in a discussion here: > https://github.com/apache/spark/pull/24565#issuecomment-493496466 > There are a few ways we could do the versioning which we still need to > discuss: > 1) Version specified by config. This allows for mixed versions across the > cluster and rolling upgrades. It also will let a spark 3.0 client talk to a > 2.4 shuffle service. But, may be a nuisance for users to get this right. > 2) Auto-detection during registration with local shuffle service. This makes > the versioning easy for the end user, and can even handle a 2.4 shuffle > service though it does not support the new versioning. However, it will not > handle a rolling upgrade correctly -- if the local shuffle service has been > upgraded, but other nodes in the cluster have not, it will get the version > wrong. > 3) Exchange versions per-connection. When a connection is opened, the server > & client could first exchange messages with their versions, so they know how > to continue communication after that. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27936) Support local dependency uploading from --py-files
[ https://issues.apache.org/jira/browse/SPARK-27936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-27936: -- Target Version/s: 3.1.0 > Support local dependency uploading from --py-files > -- > > Key: SPARK-27936 > URL: https://issues.apache.org/jira/browse/SPARK-27936 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.0.0 >Reporter: Erik Erlandson >Priority: Major > > Support python dependency uploads, as in SPARK-23153 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28629) Capture the missing rules in HiveSessionStateBuilder
[ https://issues.apache.org/jira/browse/SPARK-28629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28629: -- Target Version/s: 3.1.0 (was: 3.0.0) > Capture the missing rules in HiveSessionStateBuilder > > > Key: SPARK-28629 > URL: https://issues.apache.org/jira/browse/SPARK-28629 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Major > > A general mistake for new contributors is to forget adding the corresponding > rules into the extended extendedResolutionRules, postHocResolutionRules, > extendedCheckRules in HiveSessionStateBuilder. We need to avoid missing the > rules or capture them. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28717) Update SQL ALTER TABLE RENAME to use TableCatalog API
[ https://issues.apache.org/jira/browse/SPARK-28717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28717: -- Target Version/s: (was: 3.0.0) > Update SQL ALTER TABLE RENAME to use TableCatalog API > -- > > Key: SPARK-28717 > URL: https://issues.apache.org/jira/browse/SPARK-28717 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Edgar Rodriguez >Priority: Major > > Follow-up from SPARK-28265 > SQL implementation of ALTER TABLE RENAME needs to be updated to use the > TableCatalog API operation {{renameTable}} - having something like: > {code:java} > ALTER TABLE [catalog_name] [namespace_name] table_name > TO [new_namespace_name] new_table_name{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27936) Support local dependency uploading from --py-files
[ https://issues.apache.org/jira/browse/SPARK-27936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-27936: -- Target Version/s: (was: 3.0.0) > Support local dependency uploading from --py-files > -- > > Key: SPARK-27936 > URL: https://issues.apache.org/jira/browse/SPARK-27936 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.0.0 >Reporter: Erik Erlandson >Priority: Major > > Support python dependency uploads, as in SPARK-23153 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-30097) Adding support for core writers
[ https://issues.apache.org/jira/browse/SPARK-30097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-30097. --- Resolution: Won't Do > Adding support for core writers > > > Key: SPARK-30097 > URL: https://issues.apache.org/jira/browse/SPARK-30097 > Project: Spark > Issue Type: Improvement > Components: SQL, Structured Streaming >Affects Versions: 3.0.0 > Environment: {code:java} > {code} > >Reporter: German Schiavon Matteo >Priority: Minor > > When using *writeStream* we always have to use *format("xxx")* in order to > target the selected sink while in r*eadStream* you can use directly > *.parquet* > Basically this is to add the support to the core writers for *writeStream* > Example: > > {code:java} > writeStream > .outputMode("append") > .partitionBy("id") > .options(options) > .parquet(path) > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30097) Adding support for core writers
[ https://issues.apache.org/jira/browse/SPARK-30097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-30097: -- Target Version/s: (was: 3.0.0) > Adding support for core writers > > > Key: SPARK-30097 > URL: https://issues.apache.org/jira/browse/SPARK-30097 > Project: Spark > Issue Type: Improvement > Components: SQL, Structured Streaming >Affects Versions: 3.0.0 > Environment: {code:java} > {code} > >Reporter: German Schiavon Matteo >Priority: Minor > > When using *writeStream* we always have to use *format("xxx")* in order to > target the selected sink while in r*eadStream* you can use directly > *.parquet* > Basically this is to add the support to the core writers for *writeStream* > Example: > > {code:java} > writeStream > .outputMode("append") > .partitionBy("id") > .options(options) > .parquet(path) > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30186) support Dynamic Partition Pruning in Adaptive Execution
[ https://issues.apache.org/jira/browse/SPARK-30186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-30186: -- Fix Version/s: (was: 3.0.0) > support Dynamic Partition Pruning in Adaptive Execution > --- > > Key: SPARK-30186 > URL: https://issues.apache.org/jira/browse/SPARK-30186 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiaoju Wu >Priority: Major > > Currently Adaptive Execution cannot work if Dynamic Partition Pruning is > applied. > private def supportAdaptive(plan: SparkPlan): Boolean = { > // TODO migrate dynamic-partition-pruning onto adaptive execution. > sanityCheck(plan) && > !plan.logicalLink.exists(_.isStreaming) && > > *!plan.expressions.exists(_.find(_.isInstanceOf[DynamicPruningSubquery]).isDefined)* > && > plan.children.forall(supportAdaptive) > } > It means we cannot benefit the performance from both AE and DPP. > This ticket is target to make DPP + AE works together. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30186) support Dynamic Partition Pruning in Adaptive Execution
[ https://issues.apache.org/jira/browse/SPARK-30186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-30186: -- Target Version/s: (was: 3.0.0) > support Dynamic Partition Pruning in Adaptive Execution > --- > > Key: SPARK-30186 > URL: https://issues.apache.org/jira/browse/SPARK-30186 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiaoju Wu >Priority: Major > > Currently Adaptive Execution cannot work if Dynamic Partition Pruning is > applied. > private def supportAdaptive(plan: SparkPlan): Boolean = { > // TODO migrate dynamic-partition-pruning onto adaptive execution. > sanityCheck(plan) && > !plan.logicalLink.exists(_.isStreaming) && > > *!plan.expressions.exists(_.find(_.isInstanceOf[DynamicPruningSubquery]).isDefined)* > && > plan.children.forall(supportAdaptive) > } > It means we cannot benefit the performance from both AE and DPP. > This ticket is target to make DPP + AE works together. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-30097) Adding support for core writers
[ https://issues.apache.org/jira/browse/SPARK-30097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun closed SPARK-30097. - > Adding support for core writers > > > Key: SPARK-30097 > URL: https://issues.apache.org/jira/browse/SPARK-30097 > Project: Spark > Issue Type: Improvement > Components: SQL, Structured Streaming >Affects Versions: 3.0.0 > Environment: {code:java} > {code} > >Reporter: German Schiavon Matteo >Priority: Minor > > When using *writeStream* we always have to use *format("xxx")* in order to > target the selected sink while in r*eadStream* you can use directly > *.parquet* > Basically this is to add the support to the core writers for *writeStream* > Example: > > {code:java} > writeStream > .outputMode("append") > .partitionBy("id") > .options(options) > .parquet(path) > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30334) Add metadata around semi-structured columns to Spark
[ https://issues.apache.org/jira/browse/SPARK-30334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-30334: -- Target Version/s: 3.1.0 (was: 3.0.0) > Add metadata around semi-structured columns to Spark > > > Key: SPARK-30334 > URL: https://issues.apache.org/jira/browse/SPARK-30334 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.4.4 >Reporter: Burak Yavuz >Priority: Major > > Semi-structured data is used widely in the data industry for reporting events > in a wide variety of formats. Click events in product analytics can be stored > as json. Some application logs can be in the form of delimited key=value > text. Some data may be in xml. > The goal of this project is to be able to signal Spark that such a column > exists. This will then enable Spark to "auto-parse" these columns on the fly. > The proposal is to store this information as part of the column metadata, in > the fields: > - format: The format of the semi-structured column, e.g. json, xml, avro > - options: Options for parsing these columns > Then imagine having the following data: > {code:java} > ++---++ > | ts | event |raw | > ++---++ > | 2019-10-12 | click | {"field":"value"} | > ++---++ {code} > SELECT raw.field FROM data > will return "value" > or the following data > {code:java} > ++---+--+ > | ts | event | raw | > ++---+--+ > | 2019-10-12 | click | field1=v1|field2=v2 | > ++---+--+ {code} > SELECT raw.field1 FROM data > will return v1. > > As a first step, we will introduce the function "as_json", which accomplishes > this for JSON columns. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30324) Simplify API for JSON access in DataFrames/SQL
[ https://issues.apache.org/jira/browse/SPARK-30324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-30324: -- Target Version/s: 3.1.0 (was: 3.0.0) > Simplify API for JSON access in DataFrames/SQL > -- > > Key: SPARK-30324 > URL: https://issues.apache.org/jira/browse/SPARK-30324 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.4.4 >Reporter: Burak Yavuz >Priority: Major > > get_json_object() is a UDF to parse JSON fields. It is verbose and hard to > use, e.g. I wasn't expecting the path to a field to have to start with "$.". > We can simplify all of this when a column is of StringType, and a nested > field is requested. This API sugar will in the query planner be rewritten as > get_json_object. > This nested access can then be extended in the future to other > semi-structured formats. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30567) setDelegateCatalog should be called if catalog has implemented CatalogExtension
[ https://issues.apache.org/jira/browse/SPARK-30567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-30567: -- Target Version/s: (was: 3.1.0) > setDelegateCatalog should be called if catalog has implemented > CatalogExtension > --- > > Key: SPARK-30567 > URL: https://issues.apache.org/jira/browse/SPARK-30567 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: yu jiantao >Priority: Major > > CatalogManager.catalog calls Catalogs.load to load a catalog if it is not > 'spark_catalog' . If the catalog has implemented CatalogExtension, > setDelegateCatalog is not called when the catalog is loaded, which is not > like that we have done for v2SessionCatalog, and that makes a confusion for > customized session catalog, like iceberg SparkSessionCatalog. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30667) Support simple all gather in barrier task context
[ https://issues.apache.org/jira/browse/SPARK-30667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-30667: -- Target Version/s: 3.1.0 (was: 3.0.0) > Support simple all gather in barrier task context > - > > Key: SPARK-30667 > URL: https://issues.apache.org/jira/browse/SPARK-30667 > Project: Spark > Issue Type: New Feature > Components: PySpark, Spark Core >Affects Versions: 3.0.0 >Reporter: Xiangrui Meng >Priority: Major > > Currently we offer task.barrier() to coordinate tasks in barrier mode. Tasks > can see all IP addresses from BarrierTaskContext. It would be simpler to > integrate with distributed frameworks like TensorFlow DistributionStrategy if > we provide all gather that can let tasks share additional information with > others, e.g., an available port. > Note that with all gather, tasks are share their IP addresses as well. > {code} > port = ... # get an available port > ports = context.all_gather(port) # get all available ports, ordered by task ID > ... # set up distributed training service > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30567) setDelegateCatalog should be called if catalog has implemented CatalogExtension
[ https://issues.apache.org/jira/browse/SPARK-30567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-30567: -- Fix Version/s: (was: 3.0.0) > setDelegateCatalog should be called if catalog has implemented > CatalogExtension > --- > > Key: SPARK-30567 > URL: https://issues.apache.org/jira/browse/SPARK-30567 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: yu jiantao >Priority: Major > > CatalogManager.catalog calls Catalogs.load to load a catalog if it is not > 'spark_catalog' . If the catalog has implemented CatalogExtension, > setDelegateCatalog is not called when the catalog is loaded, which is not > like that we have done for v2SessionCatalog, and that makes a confusion for > customized session catalog, like iceberg SparkSessionCatalog. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30567) setDelegateCatalog should be called if catalog has implemented CatalogExtension
[ https://issues.apache.org/jira/browse/SPARK-30567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-30567: -- Target Version/s: 3.1.0 (was: 3.0.0) > setDelegateCatalog should be called if catalog has implemented > CatalogExtension > --- > > Key: SPARK-30567 > URL: https://issues.apache.org/jira/browse/SPARK-30567 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: yu jiantao >Priority: Major > > CatalogManager.catalog calls Catalogs.load to load a catalog if it is not > 'spark_catalog' . If the catalog has implemented CatalogExtension, > setDelegateCatalog is not called when the catalog is loaded, which is not > like that we have done for v2SessionCatalog, and that makes a confusion for > customized session catalog, like iceberg SparkSessionCatalog. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24941) Add RDDBarrier.coalesce() function
[ https://issues.apache.org/jira/browse/SPARK-24941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-24941: -- Target Version/s: 3.1.0 > Add RDDBarrier.coalesce() function > -- > > Key: SPARK-24941 > URL: https://issues.apache.org/jira/browse/SPARK-24941 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Xingbo Jiang >Priority: Major > > https://github.com/apache/spark/pull/21758#discussion_r204917245 > The number of partitions from the input data can be unexpectedly large, eg. > if you do > {code} > sc.textFile(...).barrier().mapPartitions() > {code} > The number of input partitions is based on the hdfs input splits. We shall > provide a way in RDDBarrier to enable users to specify the number of tasks in > a barrier stage. Maybe something like RDDBarrier.coalesce(numPartitions: Int) > . -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30334) Add metadata around semi-structured columns to Spark
[ https://issues.apache.org/jira/browse/SPARK-30334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-30334: -- Target Version/s: 3.1.0 > Add metadata around semi-structured columns to Spark > > > Key: SPARK-30334 > URL: https://issues.apache.org/jira/browse/SPARK-30334 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.4.4 >Reporter: Burak Yavuz >Priority: Major > > Semi-structured data is used widely in the data industry for reporting events > in a wide variety of formats. Click events in product analytics can be stored > as json. Some application logs can be in the form of delimited key=value > text. Some data may be in xml. > The goal of this project is to be able to signal Spark that such a column > exists. This will then enable Spark to "auto-parse" these columns on the fly. > The proposal is to store this information as part of the column metadata, in > the fields: > - format: The format of the semi-structured column, e.g. json, xml, avro > - options: Options for parsing these columns > Then imagine having the following data: > {code:java} > ++---++ > | ts | event |raw | > ++---++ > | 2019-10-12 | click | {"field":"value"} | > ++---++ {code} > SELECT raw.field FROM data > will return "value" > or the following data > {code:java} > ++---+--+ > | ts | event | raw | > ++---+--+ > | 2019-10-12 | click | field1=v1|field2=v2 | > ++---+--+ {code} > SELECT raw.field1 FROM data > will return v1. > > As a first step, we will introduce the function "as_json", which accomplishes > this for JSON columns. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20964) Make some keywords reserved along with the ANSI/SQL standard
[ https://issues.apache.org/jira/browse/SPARK-20964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-20964: -- Target Version/s: 3.1.0 > Make some keywords reserved along with the ANSI/SQL standard > > > Key: SPARK-20964 > URL: https://issues.apache.org/jira/browse/SPARK-20964 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.1 >Reporter: Takeshi Yamamuro >Priority: Minor > > The current Spark has many non-reserved words that are essentially reserved > in the ANSI/SQL standard > (http://developer.mimer.se/validator/sql-reserved-words.tml). > https://github.com/apache/spark/blob/master/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4#L709 > This is because there are many datasources (for instance twitter4j) that > unfortunately use reserved keywords for column names (See [~hvanhovell]'s > comments: https://github.com/apache/spark/pull/18079#discussion_r118842186). > We might fix this issue in future major releases. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30186) support Dynamic Partition Pruning in Adaptive Execution
[ https://issues.apache.org/jira/browse/SPARK-30186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-30186: -- Target Version/s: 3.1.0 > support Dynamic Partition Pruning in Adaptive Execution > --- > > Key: SPARK-30186 > URL: https://issues.apache.org/jira/browse/SPARK-30186 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiaoju Wu >Priority: Major > Fix For: 3.0.0 > > > Currently Adaptive Execution cannot work if Dynamic Partition Pruning is > applied. > private def supportAdaptive(plan: SparkPlan): Boolean = { > // TODO migrate dynamic-partition-pruning onto adaptive execution. > sanityCheck(plan) && > !plan.logicalLink.exists(_.isStreaming) && > > *!plan.expressions.exists(_.find(_.isInstanceOf[DynamicPruningSubquery]).isDefined)* > && > plan.children.forall(supportAdaptive) > } > It means we cannot benefit the performance from both AE and DPP. > This ticket is target to make DPP + AE works together. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25531) new write APIs for data source v2
[ https://issues.apache.org/jira/browse/SPARK-25531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-25531: -- Target Version/s: 3.1.0 > new write APIs for data source v2 > - > > Key: SPARK-25531 > URL: https://issues.apache.org/jira/browse/SPARK-25531 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Wenchen Fan >Priority: Major > > The current data source write API heavily depend on {{SaveMode}}, which > doesn't have a clear semantic, especially when writing to tables. > We should design a new set of write API without {{SaveMode}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24942) Improve cluster resource management with jobs containing barrier stage
[ https://issues.apache.org/jira/browse/SPARK-24942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-24942: -- Target Version/s: 3.1.0 > Improve cluster resource management with jobs containing barrier stage > -- > > Key: SPARK-24942 > URL: https://issues.apache.org/jira/browse/SPARK-24942 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Xingbo Jiang >Priority: Major > > https://github.com/apache/spark/pull/21758#discussion_r205652317 > We shall improve cluster resource management to address the following issues: > - With dynamic resource allocation enabled, it may happen that we acquire > some executors (but not enough to launch all the tasks in a barrier stage) > and later release them due to executor idle time expire, and then acquire > again. > - There can be deadlock with two concurrent applications. Each application > may acquire some resources, but not enough to launch all the tasks in a > barrier stage. And after hitting the idle timeout and releasing them, they > may acquire resources again, but just continually trade resources between > each other. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30667) Support simple all gather in barrier task context
[ https://issues.apache.org/jira/browse/SPARK-30667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-30667: -- Target Version/s: 3.1.0 > Support simple all gather in barrier task context > - > > Key: SPARK-30667 > URL: https://issues.apache.org/jira/browse/SPARK-30667 > Project: Spark > Issue Type: New Feature > Components: PySpark, Spark Core >Affects Versions: 3.0.0 >Reporter: Xiangrui Meng >Priority: Major > > Currently we offer task.barrier() to coordinate tasks in barrier mode. Tasks > can see all IP addresses from BarrierTaskContext. It would be simpler to > integrate with distributed frameworks like TensorFlow DistributionStrategy if > we provide all gather that can let tasks share additional information with > others, e.g., an available port. > Note that with all gather, tasks are share their IP addresses as well. > {code} > port = ... # get an available port > ports = context.all_gather(port) # get all available ports, ordered by task ID > ... # set up distributed training service > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27780) Shuffle server & client should be versioned to enable smoother upgrade
[ https://issues.apache.org/jira/browse/SPARK-27780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-27780: -- Target Version/s: 3.1.0 > Shuffle server & client should be versioned to enable smoother upgrade > -- > > Key: SPARK-27780 > URL: https://issues.apache.org/jira/browse/SPARK-27780 > Project: Spark > Issue Type: New Feature > Components: Shuffle, Spark Core >Affects Versions: 3.0.0 >Reporter: Imran Rashid >Priority: Major > > The external shuffle service is often upgraded at a different time than spark > itself. However, this causes problems when the protocol changes between the > shuffle service and the spark runtime -- this forces users to upgrade > everything simultaneously. > We should add versioning to the shuffle client & server, so they know what > messages the other will support. This would allow better handling of mixed > versions, from better error msgs to allowing some mismatched versions (with > reduced capabilities). > This originally came up in a discussion here: > https://github.com/apache/spark/pull/24565#issuecomment-493496466 > There are a few ways we could do the versioning which we still need to > discuss: > 1) Version specified by config. This allows for mixed versions across the > cluster and rolling upgrades. It also will let a spark 3.0 client talk to a > 2.4 shuffle service. But, may be a nuisance for users to get this right. > 2) Auto-detection during registration with local shuffle service. This makes > the versioning easy for the end user, and can even handle a 2.4 shuffle > service though it does not support the new versioning. However, it will not > handle a rolling upgrade correctly -- if the local shuffle service has been > upgraded, but other nodes in the cluster have not, it will get the version > wrong. > 3) Exchange versions per-connection. When a connection is opened, the server > & client could first exchange messages with their versions, so they know how > to continue communication after that. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26425) Add more constraint checks in file streaming source to avoid checkpoint corruption
[ https://issues.apache.org/jira/browse/SPARK-26425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-26425: -- Target Version/s: 3.1.0 > Add more constraint checks in file streaming source to avoid checkpoint > corruption > -- > > Key: SPARK-26425 > URL: https://issues.apache.org/jira/browse/SPARK-26425 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.0.0 >Reporter: Tathagata Das >Assignee: Tathagata Das >Priority: Major > > Two issues observed in production. > - HDFSMetadataLog.getLatest() tries to read older versions when it is not > able to read the latest listed version file. Not sure why this was done but > this should not be done. If the latest listed file is not readable, then > something is horribly wrong and we should fail rather than report an older > version as that can completely corrupt the checkpoint directory. > - FileStreamSource should check whether adding the a new batch to the > FileStreamSourceLog succeeded or not (similar to how StreamExecution checks > for the OffsetSeqLog) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30567) setDelegateCatalog should be called if catalog has implemented CatalogExtension
[ https://issues.apache.org/jira/browse/SPARK-30567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-30567: -- Target Version/s: 3.1.0 > setDelegateCatalog should be called if catalog has implemented > CatalogExtension > --- > > Key: SPARK-30567 > URL: https://issues.apache.org/jira/browse/SPARK-30567 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: yu jiantao >Priority: Major > Fix For: 3.0.0 > > > CatalogManager.catalog calls Catalogs.load to load a catalog if it is not > 'spark_catalog' . If the catalog has implemented CatalogExtension, > setDelegateCatalog is not called when the catalog is loaded, which is not > like that we have done for v2SessionCatalog, and that makes a confusion for > customized session catalog, like iceberg SparkSessionCatalog. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30324) Simplify API for JSON access in DataFrames/SQL
[ https://issues.apache.org/jira/browse/SPARK-30324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-30324: -- Target Version/s: 3.1.0 > Simplify API for JSON access in DataFrames/SQL > -- > > Key: SPARK-30324 > URL: https://issues.apache.org/jira/browse/SPARK-30324 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.4.4 >Reporter: Burak Yavuz >Priority: Major > > get_json_object() is a UDF to parse JSON fields. It is verbose and hard to > use, e.g. I wasn't expecting the path to a field to have to start with "$.". > We can simplify all of this when a column is of StringType, and a nested > field is requested. This API sugar will in the query planner be rewritten as > get_json_object. > This nested access can then be extended in the future to other > semi-structured formats. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30097) Adding support for core writers
[ https://issues.apache.org/jira/browse/SPARK-30097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-30097: -- Target Version/s: 3.1.0 > Adding support for core writers > > > Key: SPARK-30097 > URL: https://issues.apache.org/jira/browse/SPARK-30097 > Project: Spark > Issue Type: Improvement > Components: SQL, Structured Streaming >Affects Versions: 3.0.0 > Environment: {code:java} > {code} > >Reporter: German Schiavon Matteo >Priority: Minor > > When using *writeStream* we always have to use *format("xxx")* in order to > target the selected sink while in r*eadStream* you can use directly > *.parquet* > Basically this is to add the support to the core writers for *writeStream* > Example: > > {code:java} > writeStream > .outputMode("append") > .partitionBy("id") > .options(options) > .parquet(path) > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24625) put all the backward compatible behavior change configs under spark.sql.legacy.*
[ https://issues.apache.org/jira/browse/SPARK-24625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-24625: -- Target Version/s: 3.1.0 > put all the backward compatible behavior change configs under > spark.sql.legacy.* > > > Key: SPARK-24625 > URL: https://issues.apache.org/jira/browse/SPARK-24625 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Wenchen Fan >Priority: Major > > Recently we made several behavior changes to Spark SQL, to make it more ANSI > SQL compliant or fix some unreasonable behaviors. For backward compatibility, > we add configs to allow users fallback to the old behavior and plan to remove > them in Spark 3.0. It's better to put these configs under spark.sql.legacy.* -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19842) Informational Referential Integrity Constraints Support in Spark
[ https://issues.apache.org/jira/browse/SPARK-19842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-19842: -- Target Version/s: 3.1.0 > Informational Referential Integrity Constraints Support in Spark > > > Key: SPARK-19842 > URL: https://issues.apache.org/jira/browse/SPARK-19842 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: Ioana Delaney >Priority: Major > Attachments: InformationalRIConstraints.doc > > > *Informational Referential Integrity Constraints Support in Spark* > This work proposes support for _informational primary key_ and _foreign key > (referential integrity) constraints_ in Spark. The main purpose is to open up > an area of query optimization techniques that rely on referential integrity > constraints semantics. > An _informational_ or _statistical constraint_ is a constraint such as a > _unique_, _primary key_, _foreign key_, or _check constraint_, that can be > used by Spark to improve query performance. Informational constraints are not > enforced by the Spark SQL engine; rather, they are used by Catalyst to > optimize the query processing. They provide semantics information that allows > Catalyst to rewrite queries to eliminate joins, push down aggregates, remove > unnecessary Distinct operations, and perform a number of other optimizations. > Informational constraints are primarily targeted to applications that load > and analyze data that originated from a data warehouse. For such > applications, the conditions for a given constraint are known to be true, so > the constraint does not need to be enforced during data load operations. > The attached document covers constraint definition, metastore storage, > constraint validation, and maintenance. The document shows many examples of > query performance improvements that utilize referential integrity constraints > and can be implemented in Spark. > Link to the google doc: > [InformationalRIConstraints|https://docs.google.com/document/d/17r-cOqbKF7Px0xb9L7krKg2-RQB_gD2pxOmklm-ehsw/edit] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27471) Reorganize public v2 catalog API
[ https://issues.apache.org/jira/browse/SPARK-27471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-27471: -- Target Version/s: 3.1.0 > Reorganize public v2 catalog API > > > Key: SPARK-27471 > URL: https://issues.apache.org/jira/browse/SPARK-27471 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Ryan Blue >Priority: Blocker > > In the review for SPARK-27181, Reynold suggested some package moves. We've > decided (at the v2 community sync) not to delay by having this discussion now > because we want to get the new catalog API in so we can work on more logical > plans in parallel. But we do need to make sure we have a sane package scheme > for the next release. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28629) Capture the missing rules in HiveSessionStateBuilder
[ https://issues.apache.org/jira/browse/SPARK-28629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28629: -- Target Version/s: 3.1.0 > Capture the missing rules in HiveSessionStateBuilder > > > Key: SPARK-28629 > URL: https://issues.apache.org/jira/browse/SPARK-28629 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Major > > A general mistake for new contributors is to forget adding the corresponding > rules into the extended extendedResolutionRules, postHocResolutionRules, > extendedCheckRules in HiveSessionStateBuilder. We need to avoid missing the > rules or capture them. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28717) Update SQL ALTER TABLE RENAME to use TableCatalog API
[ https://issues.apache.org/jira/browse/SPARK-28717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28717: -- Target Version/s: 3.1.0 > Update SQL ALTER TABLE RENAME to use TableCatalog API > -- > > Key: SPARK-28717 > URL: https://issues.apache.org/jira/browse/SPARK-28717 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Edgar Rodriguez >Priority: Major > > Follow-up from SPARK-28265 > SQL implementation of ALTER TABLE RENAME needs to be updated to use the > TableCatalog API operation {{renameTable}} - having something like: > {code:java} > ALTER TABLE [catalog_name] [namespace_name] table_name > TO [new_namespace_name] new_table_name{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25383) Image data source supports sample pushdown
[ https://issues.apache.org/jira/browse/SPARK-25383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-25383: -- Target Version/s: 3.1.0 > Image data source supports sample pushdown > -- > > Key: SPARK-25383 > URL: https://issues.apache.org/jira/browse/SPARK-25383 > Project: Spark > Issue Type: New Feature > Components: ML, SQL >Affects Versions: 3.0.0 >Reporter: Xiangrui Meng >Priority: Major > > After SPARK-25349, we should update image data source to support sampling. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25752) Add trait to easily whitelist logical operators that produce named output from CleanupAliases
[ https://issues.apache.org/jira/browse/SPARK-25752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-25752: -- Target Version/s: 3.1.0 > Add trait to easily whitelist logical operators that produce named output > from CleanupAliases > - > > Key: SPARK-25752 > URL: https://issues.apache.org/jira/browse/SPARK-25752 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Tathagata Das >Assignee: Tathagata Das >Priority: Minor > > The rule `CleanupAliases` cleans up aliases from logical operators that do > not match a whitelist. This whitelist is hardcoded inside the rule which is > cumbersome. This PR is to clean that up by making a trait `HasNamedOutput` > that will be ignored by `CleanupAliases` and other ops that require aliases > to be preserved in the operator should extend it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27936) Support local dependency uploading from --py-files
[ https://issues.apache.org/jira/browse/SPARK-27936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-27936: -- Target Version/s: 3.1.0 > Support local dependency uploading from --py-files > -- > > Key: SPARK-27936 > URL: https://issues.apache.org/jira/browse/SPARK-27936 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.0.0 >Reporter: Erik Erlandson >Priority: Major > > Support python dependency uploads, as in SPARK-23153 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22231) Support of map, filter, withColumn, dropColumn in nested list of structures
[ https://issues.apache.org/jira/browse/SPARK-22231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-22231: -- Target Version/s: 3.1.0 > Support of map, filter, withColumn, dropColumn in nested list of structures > --- > > Key: SPARK-22231 > URL: https://issues.apache.org/jira/browse/SPARK-22231 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.2.0 >Reporter: DB Tsai >Assignee: Jeremy Smith >Priority: Major > > At Netflix's algorithm team, we work on ranking problems to find the great > content to fulfill the unique tastes of our members. Before building a > recommendation algorithms, we need to prepare the training, testing, and > validation datasets in Apache Spark. Due to the nature of ranking problems, > we have a nested list of items to be ranked in one column, and the top level > is the contexts describing the setting for where a model is to be used (e.g. > profiles, country, time, device, etc.) Here is a blog post describing the > details, [Distributed Time Travel for Feature > Generation|https://medium.com/netflix-techblog/distributed-time-travel-for-feature-generation-389cccdd3907]. > > To be more concrete, for the ranks of videos for a given profile_id at a > given country, our data schema can be looked like this, > {code:java} > root > |-- profile_id: long (nullable = true) > |-- country_iso_code: string (nullable = true) > |-- items: array (nullable = false) > ||-- element: struct (containsNull = false) > |||-- title_id: integer (nullable = true) > |||-- scores: double (nullable = true) > ... > {code} > We oftentimes need to work on the nested list of structs by applying some > functions on them. Sometimes, we're dropping or adding new columns in the > nested list of structs. Currently, there is no easy solution in open source > Apache Spark to perform those operations using SQL primitives; many people > just convert the data into RDD to work on the nested level of data, and then > reconstruct the new dataframe as workaround. This is extremely inefficient > because all the optimizations like predicate pushdown in SQL can not be > performed, we can not leverage on the columnar format, and the serialization > and deserialization cost becomes really huge even we just want to add a new > column in the nested level. > We built a solution internally at Netflix which we're very happy with. We > plan to make it open source in Spark upstream. We would like to socialize the > API design to see if we miss any use-case. > The first API we added is *mapItems* on dataframe which take a function from > *Column* to *Column*, and then apply the function on nested dataframe. Here > is an example, > {code:java} > case class Data(foo: Int, bar: Double, items: Seq[Double]) > val df: Dataset[Data] = spark.createDataset(Seq( > Data(10, 10.0, Seq(10.1, 10.2, 10.3, 10.4)), > Data(20, 20.0, Seq(20.1, 20.2, 20.3, 20.4)) > )) > val result = df.mapItems("items") { > item => item * 2.0 > } > result.printSchema() > // root > // |-- foo: integer (nullable = false) > // |-- bar: double (nullable = false) > // |-- items: array (nullable = true) > // ||-- element: double (containsNull = true) > result.show() > // +---+++ > // |foo| bar| items| > // +---+++ > // | 10|10.0|[20.2, 20.4, 20.6...| > // | 20|20.0|[40.2, 40.4, 40.6...| > // +---+++ > {code} > Now, with the ability of applying a function in the nested dataframe, we can > add a new function, *withColumn* in *Column* to add or replace the existing > column that has the same name in the nested list of struct. Here is two > examples demonstrating the API together with *mapItems*; the first one > replaces the existing column, > {code:java} > case class Item(a: Int, b: Double) > case class Data(foo: Int, bar: Double, items: Seq[Item]) > val df: Dataset[Data] = spark.createDataset(Seq( > Data(10, 10.0, Seq(Item(10, 10.0), Item(11, 11.0))), > Data(20, 20.0, Seq(Item(20, 20.0), Item(21, 21.0))) > )) > val result = df.mapItems("items") { > item => item.withColumn(item("b") + 1 as "b") > } > result.printSchema > root > // |-- foo: integer (nullable = false) > // |-- bar: double (nullable = false) > // |-- items: array (nullable = true) > // ||-- element: struct (containsNull = true) > // |||-- a: integer (nullable = true) > // |||-- b: double (nullable = true) > result.show(false) > // +---++--+ > // |foo|bar |items | > // +---++--+ > // |10 |10.0|[[10,11.0], [11,12.0]]| > // |20 |20.0|[[20,21.0], [21,22.0]]| > // +---++--+
[jira] [Commented] (SPARK-30657) Streaming limit after streaming dropDuplicates can throw error
[ https://issues.apache.org/jira/browse/SPARK-30657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028223#comment-17028223 ] Shixiong Zhu commented on SPARK-30657: -- [~tdas] Make sense. Agreed that the risk is high but the benefit is pretty low. We can backport it later whenever needed. > Streaming limit after streaming dropDuplicates can throw error > -- > > Key: SPARK-30657 > URL: https://issues.apache.org/jira/browse/SPARK-30657 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, > 2.4.3, 2.4.4 >Reporter: Tathagata Das >Assignee: Tathagata Das >Priority: Critical > Fix For: 3.0.0 > > > {{LocalLimitExec}} does not consume the iterator of the child plan. So if > there is a limit after a stateful operator like streaming dedup in append > mode (e.g. {{streamingdf.dropDuplicates().limit(5}})), the state changes of > streaming duplicate may not be committed (most stateful ops commit state > changes only after the generated iterator is fully consumed). This leads to > the next batch failing with {{java.lang.IllegalStateException: Error reading > delta file .../N.delta does not exist}} as the state store delta file was > never generated. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-30701) SQL test running on Windows: hadoop chgrp warnings
[ https://issues.apache.org/jira/browse/SPARK-30701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028026#comment-17028026 ] Guram Savinov edited comment on SPARK-30701 at 2/1/20 9:38 AM: --- So the problem is: backslash character isn't included to allowedChars, see attached HadoopGroupTest.java This is Hadoop issue, not about Spark. was (Author: gsavinov): So the problem is: backslash character isn't included to allowedChars, see attached HadoopGroupTest.java > SQL test running on Windows: hadoop chgrp warnings > -- > > Key: SPARK-30701 > URL: https://issues.apache.org/jira/browse/SPARK-30701 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4 > Environment: Windows 10 > Winutils 2.7.1: > [https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1] > Oracle JavaSE 8 > SparkSQL 2.4.4 / Hadoop 2.6.5 > Using: -Dhive.exec.scratchdir=C:\Users\OSUser\hadoop\tmp\hive > Set: winutils chmod -R 777 \Users\OSUser\hadoop\tmp\hive >Reporter: Guram Savinov >Assignee: Felix Cheung >Priority: Major > Labels: WIndows, hive, unit-test > Attachments: HadoopGroupTest.java > > > Running SparkSQL local embedded unit tests on Win10, using winutils. > Got warnings about 'hadoop chgrp'. > See environment info. > {code:bash} > -chgrp: 'TEST\Domain users' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > -chgrp: 'TEST\Domain users' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > -chgrp: 'TEST\Domain users' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > {code} > Related info on SO: > https://stackoverflow.com/questions/48605907/error-in-pyspark-when-insert-data-in-hive > Seems like the problem is here: > hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FsShellPermissions.java:210 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30701) SQL test running on Windows: hadoop chgrp warnings
[ https://issues.apache.org/jira/browse/SPARK-30701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guram Savinov updated SPARK-30701: -- Environment: Windows 10 Winutils 2.7.1: [https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1] Oracle JavaSE 8 SparkSQL 2.4.4 / Hadoop 2.6.5 Using: -Dhive.exec.scratchdir=C:\Users\OSUser\hadoop\tmp\hive Set: winutils chmod -R 777 \Users\OSUser\hadoop\tmp\hive was: Windows 10 Winutils 2.7.1: [https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1] Oracle JavaSE 8 SparkSQL 2.4.4 Using: -Dhive.exec.scratchdir=C:\Users\OSUser\hadoop\tmp\hive Set: winutils chmod -R 777 \Users\OSUser\hadoop\tmp\hive > SQL test running on Windows: hadoop chgrp warnings > -- > > Key: SPARK-30701 > URL: https://issues.apache.org/jira/browse/SPARK-30701 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4 > Environment: Windows 10 > Winutils 2.7.1: > [https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1] > Oracle JavaSE 8 > SparkSQL 2.4.4 / Hadoop 2.6.5 > Using: -Dhive.exec.scratchdir=C:\Users\OSUser\hadoop\tmp\hive > Set: winutils chmod -R 777 \Users\OSUser\hadoop\tmp\hive >Reporter: Guram Savinov >Assignee: Felix Cheung >Priority: Major > Labels: WIndows, hive, unit-test > Attachments: HadoopGroupTest.java > > > Running SparkSQL local embedded unit tests on Win10, using winutils. > Got warnings about 'hadoop chgrp'. > See environment info. > {code:bash} > -chgrp: 'TEST\Domain users' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > -chgrp: 'TEST\Domain users' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > -chgrp: 'TEST\Domain users' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > {code} > Related info on SO: > https://stackoverflow.com/questions/48605907/error-in-pyspark-when-insert-data-in-hive > Seems like the problem is here: > hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FsShellPermissions.java:210 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30702) Support subexpression elimination in whole stage codegen
Yuming Wang created SPARK-30702: --- Summary: Support subexpression elimination in whole stage codegen Key: SPARK-30702 URL: https://issues.apache.org/jira/browse/SPARK-30702 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Yuming Wang Please see https://github.com/apache/spark/blob/a3a42b30d04009282e770c289b043ca5941e32e5/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala#L2011-L2067 for more details. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30701) SQL test running on Windows: hadoop chgrp warnings
[ https://issues.apache.org/jira/browse/SPARK-30701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028026#comment-17028026 ] Guram Savinov commented on SPARK-30701: --- So the problem is: backslash character isn't included to allowedChars, see attached HadoopGroupTest.java > SQL test running on Windows: hadoop chgrp warnings > -- > > Key: SPARK-30701 > URL: https://issues.apache.org/jira/browse/SPARK-30701 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4 > Environment: Windows 10 > Winutils 2.7.1: > [https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1] > Oracle JavaSE 8 > SparkSQL 2.4.4 > Using: -Dhive.exec.scratchdir=C:\Users\OSUser\hadoop\tmp\hive > Set: winutils chmod -R 777 \Users\OSUser\hadoop\tmp\hive >Reporter: Guram Savinov >Assignee: Felix Cheung >Priority: Major > Labels: WIndows, hive, unit-test > Attachments: HadoopGroupTest.java > > > Running SparkSQL local embedded unit tests on Win10, using winutils. > Got warnings about 'hadoop chgrp'. > See environment info. > {code:bash} > -chgrp: 'TEST\Domain users' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > -chgrp: 'TEST\Domain users' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > -chgrp: 'TEST\Domain users' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > {code} > Related info on SO: > https://stackoverflow.com/questions/48605907/error-in-pyspark-when-insert-data-in-hive > Seems like the problem is here: > hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FsShellPermissions.java:210 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30701) SQL test running on Windows: hadoop chgrp warnings
[ https://issues.apache.org/jira/browse/SPARK-30701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guram Savinov updated SPARK-30701: -- Attachment: HadoopGroupTest.java > SQL test running on Windows: hadoop chgrp warnings > -- > > Key: SPARK-30701 > URL: https://issues.apache.org/jira/browse/SPARK-30701 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4 > Environment: Windows 10 > Winutils 2.7.1: > [https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1] > Oracle JavaSE 8 > SparkSQL 2.4.4 > Using: -Dhive.exec.scratchdir=C:\Users\OSUser\hadoop\tmp\hive > Set: winutils chmod -R 777 \Users\OSUser\hadoop\tmp\hive >Reporter: Guram Savinov >Assignee: Felix Cheung >Priority: Major > Labels: WIndows, hive, unit-test > Attachments: HadoopGroupTest.java > > > Running SparkSQL local embedded unit tests on Win10, using winutils. > Got warnings about 'hadoop chgrp'. > See environment info. > {code:bash} > -chgrp: 'TEST\Domain users' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > -chgrp: 'TEST\Domain users' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > -chgrp: 'TEST\Domain users' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > {code} > Related info on SO: > https://stackoverflow.com/questions/48605907/error-in-pyspark-when-insert-data-in-hive > Seems like the problem is here: > hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FsShellPermissions.java:210 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30701) SQL test running on Windows: hadoop chgrp warnings
[ https://issues.apache.org/jira/browse/SPARK-30701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guram Savinov updated SPARK-30701: -- Description: Running SparkSQL local embedded unit tests on Win10, using winutils. Got warnings about 'hadoop chgrp'. See environment info. {code:bash} -chgrp: 'TEST\Domain users' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... -chgrp: 'TEST\Domain users' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... -chgrp: 'TEST\Domain users' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... {code} Related info on SO: https://stackoverflow.com/questions/48605907/error-in-pyspark-when-insert-data-in-hive Seems like the problem is here: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FsShellPermissions.java:210 was: Running SparkSQL local embedded unit tests on Win10, using winutils. Got warnings about 'hadoop chgrp'. See environment info. {code:bash} -chgrp: 'TEST\Domain users' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... -chgrp: 'TEST\Domain users' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... -chgrp: 'TEST\Domain users' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... {code} Related info on SO: https://stackoverflow.com/questions/48605907/error-in-pyspark-when-insert-data-in-hive > SQL test running on Windows: hadoop chgrp warnings > -- > > Key: SPARK-30701 > URL: https://issues.apache.org/jira/browse/SPARK-30701 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4 > Environment: Windows 10 > Winutils 2.7.1: > [https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1] > Oracle JavaSE 8 > SparkSQL 2.4.4 > Using: -Dhive.exec.scratchdir=C:\Users\OSUser\hadoop\tmp\hive > Set: winutils chmod -R 777 \Users\OSUser\hadoop\tmp\hive >Reporter: Guram Savinov >Assignee: Felix Cheung >Priority: Major > Labels: WIndows, hive, unit-test > > Running SparkSQL local embedded unit tests on Win10, using winutils. > Got warnings about 'hadoop chgrp'. > See environment info. > {code:bash} > -chgrp: 'TEST\Domain users' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > -chgrp: 'TEST\Domain users' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > -chgrp: 'TEST\Domain users' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > {code} > Related info on SO: > https://stackoverflow.com/questions/48605907/error-in-pyspark-when-insert-data-in-hive > Seems like the problem is here: > hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FsShellPermissions.java:210 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30701) SQL test running on Windows: hadoop chgrp warnings
[ https://issues.apache.org/jira/browse/SPARK-30701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guram Savinov updated SPARK-30701: -- Description: Running SparkSQL local embedded unit tests on Win10, using winutils. Got warnings about 'hadoop chgrp'. See environment info. {code:bash} -chgrp: 'TEST\Domain users' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... -chgrp: 'TEST\Domain users' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... -chgrp: 'TEST\Domain users' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... {code} Related info on SO: https://stackoverflow.com/questions/48605907/error-in-pyspark-when-insert-data-in-hive was: Running SparkSQL local embedded unit tests on Win10, using winutils. Got warnings about 'hadoop chgrp'. See environment info. {code:java} -chgrp: 'TEST\Domain users' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... -chgrp: 'TEST\Domain users' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... -chgrp: 'TEST\Domain users' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... {code} Related info on SO: https://stackoverflow.com/questions/48605907/error-in-pyspark-when-insert-data-in-hive > SQL test running on Windows: hadoop chgrp warnings > -- > > Key: SPARK-30701 > URL: https://issues.apache.org/jira/browse/SPARK-30701 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4 > Environment: Windows 10 > Winutils 2.7.1: > [https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1] > Oracle JavaSE 8 > SparkSQL 2.4.4 > Using: -Dhive.exec.scratchdir=C:\Users\OSUser\hadoop\tmp\hive > Set: winutils chmod -R 777 \Users\OSUser\hadoop\tmp\hive >Reporter: Guram Savinov >Assignee: Felix Cheung >Priority: Major > Labels: WIndows, hive, unit-test > > Running SparkSQL local embedded unit tests on Win10, using winutils. > Got warnings about 'hadoop chgrp'. > See environment info. > {code:bash} > -chgrp: 'TEST\Domain users' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > -chgrp: 'TEST\Domain users' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > -chgrp: 'TEST\Domain users' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > {code} > Related info on SO: > https://stackoverflow.com/questions/48605907/error-in-pyspark-when-insert-data-in-hive -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30701) SQL test running on Windows: hadoop chgrp warnings
[ https://issues.apache.org/jira/browse/SPARK-30701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guram Savinov updated SPARK-30701: -- Description: Running SparkSQL local embedded unit tests on Win10, using winutils. Got warnings about 'hadoop chgrp'. See environment info. {code:java} -chgrp: 'TEST\Domain users' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... -chgrp: 'TEST\Domain users' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... -chgrp: 'TEST\Domain users' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... {code} Related info on SO: https://stackoverflow.com/questions/48605907/error-in-pyspark-when-insert-data-in-hive was: Running SparkSQL local embedded unit tests on Win10, using winutils. Got warnings about 'hadoop chgrp'. See environment info. {code:java} -chgrp: 'TEST\Domain users' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... -chgrp: 'TEST\Domain users' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... -chgrp: 'TEST\Domain users' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... {code} > SQL test running on Windows: hadoop chgrp warnings > -- > > Key: SPARK-30701 > URL: https://issues.apache.org/jira/browse/SPARK-30701 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4 > Environment: Windows 10 > Winutils 2.7.1: > [https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1] > Oracle JavaSE 8 > SparkSQL 2.4.4 > Using: -Dhive.exec.scratchdir=C:\Users\OSUser\hadoop\tmp\hive > Set: winutils chmod -R 777 \Users\OSUser\hadoop\tmp\hive >Reporter: Guram Savinov >Assignee: Felix Cheung >Priority: Major > Labels: WIndows, hive, unit-test > > Running SparkSQL local embedded unit tests on Win10, using winutils. > Got warnings about 'hadoop chgrp'. > See environment info. > {code:java} > -chgrp: 'TEST\Domain users' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > -chgrp: 'TEST\Domain users' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > -chgrp: 'TEST\Domain users' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > {code} > Related info on SO: > https://stackoverflow.com/questions/48605907/error-in-pyspark-when-insert-data-in-hive -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30701) SQL test running on Windows: hadoop chgrp warnings
[ https://issues.apache.org/jira/browse/SPARK-30701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guram Savinov updated SPARK-30701: -- Description: Running SparkSQL local embedded unit tests on Win10, using winutils. Got warnings about 'hadoop chgrp'. See environment info. {code:java} -chgrp: 'TEST\Domain users' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... -chgrp: 'TEST\Domain users' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... -chgrp: 'TEST\Domain users' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... {code} was: Running SparkSQL embedded unit tests on Win10, using winutils. Got warnings about 'hadoop chgrp'. See environment info. {code:java} -chgrp: 'TEST\Domain users' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... -chgrp: 'TEST\Domain users' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... -chgrp: 'TEST\Domain users' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... {code} > SQL test running on Windows: hadoop chgrp warnings > -- > > Key: SPARK-30701 > URL: https://issues.apache.org/jira/browse/SPARK-30701 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4 > Environment: Windows 10 > Winutils 2.7.1: > [https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1] > Oracle JavaSE 8 > SparkSQL 2.4.4 > Using: -Dhive.exec.scratchdir=C:\Users\OSUser\hadoop\tmp\hive > Set: winutils chmod -R 777 \Users\OSUser\hadoop\tmp\hive >Reporter: Guram Savinov >Assignee: Felix Cheung >Priority: Major > Labels: WIndows, hive, unit-test > > Running SparkSQL local embedded unit tests on Win10, using winutils. > Got warnings about 'hadoop chgrp'. > See environment info. > {code:java} > -chgrp: 'TEST\Domain users' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > -chgrp: 'TEST\Domain users' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > -chgrp: 'TEST\Domain users' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30701) SQL test running on Windows: hadoop chgrp warnings
[ https://issues.apache.org/jira/browse/SPARK-30701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guram Savinov updated SPARK-30701: -- Environment: Windows 10 Winutils 2.7.1: [https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1] Oracle JavaSE 8 SparkSQL 2.4.4 Using: -Dhive.exec.scratchdir=C:\Users\OSUser\hadoop\tmp\hive Set: winutils chmod -R 777 \Users\OSUser\hadoop\tmp\hive was: Windows 10 Winutils 2.7.1: [https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1] Oracle JavaSE 8 SparkSQL 2.4.4 Using: -Dhive.exec.scratchdir=C:\Users\OSUser\hadoop\tmp\hive > SQL test running on Windows: hadoop chgrp warnings > -- > > Key: SPARK-30701 > URL: https://issues.apache.org/jira/browse/SPARK-30701 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4 > Environment: Windows 10 > Winutils 2.7.1: > [https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1] > Oracle JavaSE 8 > SparkSQL 2.4.4 > Using: -Dhive.exec.scratchdir=C:\Users\OSUser\hadoop\tmp\hive > Set: winutils chmod -R 777 \Users\OSUser\hadoop\tmp\hive >Reporter: Guram Savinov >Assignee: Felix Cheung >Priority: Major > Labels: WIndows, hive, unit-test > > Running SparkSQL unit tests on Win10, using winutils. > Got warnings about 'hadoop chgrp'. > See environment info. > {code:java} > -chgrp: 'TEST\Domain users' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > -chgrp: 'TEST\Domain users' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > -chgrp: 'TEST\Domain users' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30701) SQL test running on Windows: hadoop chgrp warnings
[ https://issues.apache.org/jira/browse/SPARK-30701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guram Savinov updated SPARK-30701: -- Description: Running SparkSQL embedded unit tests on Win10, using winutils. Got warnings about 'hadoop chgrp'. See environment info. {code:java} -chgrp: 'TEST\Domain users' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... -chgrp: 'TEST\Domain users' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... -chgrp: 'TEST\Domain users' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... {code} was: Running SparkSQL unit tests on Win10, using winutils. Got warnings about 'hadoop chgrp'. See environment info. {code:java} -chgrp: 'TEST\Domain users' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... -chgrp: 'TEST\Domain users' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... -chgrp: 'TEST\Domain users' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... {code} > SQL test running on Windows: hadoop chgrp warnings > -- > > Key: SPARK-30701 > URL: https://issues.apache.org/jira/browse/SPARK-30701 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4 > Environment: Windows 10 > Winutils 2.7.1: > [https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1] > Oracle JavaSE 8 > SparkSQL 2.4.4 > Using: -Dhive.exec.scratchdir=C:\Users\OSUser\hadoop\tmp\hive > Set: winutils chmod -R 777 \Users\OSUser\hadoop\tmp\hive >Reporter: Guram Savinov >Assignee: Felix Cheung >Priority: Major > Labels: WIndows, hive, unit-test > > Running SparkSQL embedded unit tests on Win10, using winutils. > Got warnings about 'hadoop chgrp'. > See environment info. > {code:java} > -chgrp: 'TEST\Domain users' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > -chgrp: 'TEST\Domain users' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > -chgrp: 'TEST\Domain users' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30701) SQL test running on Windows: hadoop chgrp warnings
[ https://issues.apache.org/jira/browse/SPARK-30701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guram Savinov updated SPARK-30701: -- Description: Running SparkSQL unit tests on Win10, using winutils. Got warnings about 'hadoop chgrp'. See environment info. {code:java} -chgrp: 'TEST\Domain users' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... -chgrp: 'TEST\Domain users' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... -chgrp: 'TEST\Domain users' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... {code} was: {code:java} -chgrp: 'TEST\Domain users' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... -chgrp: 'TEST\Domain users' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... -chgrp: 'TEST\Domain users' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... {code} > SQL test running on Windows: hadoop chgrp warnings > -- > > Key: SPARK-30701 > URL: https://issues.apache.org/jira/browse/SPARK-30701 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4 > Environment: Windows 10 > Winutils 2.7.1: > [https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1] > Oracle JavaSE 8 > SparkSQL 2.4.4 > Using: -Dhive.exec.scratchdir=C:\Users\OSUser\hadoop\tmp\hive >Reporter: Guram Savinov >Assignee: Felix Cheung >Priority: Major > Labels: WIndows, hive, unit-test > > Running SparkSQL unit tests on Win10, using winutils. > Got warnings about 'hadoop chgrp'. > See environment info. > {code:java} > -chgrp: 'TEST\Domain users' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > -chgrp: 'TEST\Domain users' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > -chgrp: 'TEST\Domain users' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30701) SQL test running on Windows: hadoop chgrp warnings
[ https://issues.apache.org/jira/browse/SPARK-30701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guram Savinov updated SPARK-30701: -- Description: {code:java} -chgrp: 'TEST\Domain users' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... -chgrp: 'TEST\Domain users' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... -chgrp: 'TEST\Domain users' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... {code} was: {code} -chgrp: 'APPVYR-WIN\None' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... -chgrp: 'APPVYR-WIN\None' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... -chgrp: 'APPVYR-WIN\None' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... .-chgrp: 'APPVYR-WIN\None' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... .-chgrp: 'APPVYR-WIN\None' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... -chgrp: 'APPVYR-WIN\None' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... {code} Environment: Windows 10 Winutils 2.7.1: [https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1] Oracle JavaSE 8 SparkSQL 2.4.4 Using: -Dhive.exec.scratchdir=C:\Users\OSUser\hadoop\tmp\hive > SQL test running on Windows: hadoop chgrp warnings > -- > > Key: SPARK-30701 > URL: https://issues.apache.org/jira/browse/SPARK-30701 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4 > Environment: Windows 10 > Winutils 2.7.1: > [https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1] > Oracle JavaSE 8 > SparkSQL 2.4.4 > Using: -Dhive.exec.scratchdir=C:\Users\OSUser\hadoop\tmp\hive >Reporter: Guram Savinov >Assignee: Felix Cheung >Priority: Major > Labels: WIndows, hive, unit-test > > {code:java} > -chgrp: 'TEST\Domain users' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > -chgrp: 'TEST\Domain users' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > -chgrp: 'TEST\Domain users' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30701) SQL test running on Windows: hadoop chgrp warnings
[ https://issues.apache.org/jira/browse/SPARK-30701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guram Savinov updated SPARK-30701: -- Affects Version/s: (was: 2.3.0) 2.4.4 > SQL test running on Windows: hadoop chgrp warnings > -- > > Key: SPARK-30701 > URL: https://issues.apache.org/jira/browse/SPARK-30701 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4 >Reporter: Guram Savinov >Assignee: Felix Cheung >Priority: Major > Labels: WIndows, hive, unit-test > > {code} > -chgrp: 'APPVYR-WIN\None' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > -chgrp: 'APPVYR-WIN\None' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > -chgrp: 'APPVYR-WIN\None' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > .-chgrp: 'APPVYR-WIN\None' > does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > .-chgrp: 'APPVYR-WIN\None' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > -chgrp: 'APPVYR-WIN\None' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30701) SQL test running on Windows: hadoop chgrp warnings
[ https://issues.apache.org/jira/browse/SPARK-30701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guram Savinov updated SPARK-30701: -- Labels: WIndows hive unit-test (was: bulk-closed) > SQL test running on Windows: hadoop chgrp warnings > -- > > Key: SPARK-30701 > URL: https://issues.apache.org/jira/browse/SPARK-30701 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Guram Savinov >Assignee: Felix Cheung >Priority: Major > Labels: WIndows, hive, unit-test > > {code} > -chgrp: 'APPVYR-WIN\None' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > -chgrp: 'APPVYR-WIN\None' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > -chgrp: 'APPVYR-WIN\None' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > .-chgrp: 'APPVYR-WIN\None' > does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > .-chgrp: 'APPVYR-WIN\None' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > -chgrp: 'APPVYR-WIN\None' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30701) SQL test running on Windows: hadoop chgrp warnings
[ https://issues.apache.org/jira/browse/SPARK-30701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guram Savinov updated SPARK-30701: -- Component/s: (was: SparkR) > SQL test running on Windows: hadoop chgrp warnings > -- > > Key: SPARK-30701 > URL: https://issues.apache.org/jira/browse/SPARK-30701 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Guram Savinov >Assignee: Felix Cheung >Priority: Major > Labels: bulk-closed > > {code} > -chgrp: 'APPVYR-WIN\None' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > -chgrp: 'APPVYR-WIN\None' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > -chgrp: 'APPVYR-WIN\None' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > .-chgrp: 'APPVYR-WIN\None' > does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > .-chgrp: 'APPVYR-WIN\None' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > -chgrp: 'APPVYR-WIN\None' does not match expected pattern for group > Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30701) SQL test running on Windows: hadoop chgrp warnings
Guram Savinov created SPARK-30701: - Summary: SQL test running on Windows: hadoop chgrp warnings Key: SPARK-30701 URL: https://issues.apache.org/jira/browse/SPARK-30701 Project: Spark Issue Type: Bug Components: SparkR, SQL Affects Versions: 2.3.0 Reporter: Guram Savinov Assignee: Felix Cheung {code} -chgrp: 'APPVYR-WIN\None' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... -chgrp: 'APPVYR-WIN\None' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... -chgrp: 'APPVYR-WIN\None' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... .-chgrp: 'APPVYR-WIN\None' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... .-chgrp: 'APPVYR-WIN\None' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... -chgrp: 'APPVYR-WIN\None' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org