[jira] [Resolved] (SPARK-30007) Publish snapshot/release with `-Phive-2.3` only
[ https://issues.apache.org/jira/browse/SPARK-30007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-30007. --- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 26648 [https://github.com/apache/spark/pull/26648] > Publish snapshot/release with `-Phive-2.3` only > --- > > Key: SPARK-30007 > URL: https://issues.apache.org/jira/browse/SPARK-30007 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.1.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-30007) Publish snapshot/release with `-Phive-2.3` only
[ https://issues.apache.org/jira/browse/SPARK-30007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-30007: - Assignee: Dongjoon Hyun > Publish snapshot/release with `-Phive-2.3` only > --- > > Key: SPARK-30007 > URL: https://issues.apache.org/jira/browse/SPARK-30007 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30007) Publish snapshot/release with `-Phive-2.3` only
[ https://issues.apache.org/jira/browse/SPARK-30007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-30007: -- Fix Version/s: (was: 3.1.0) 3.0.0 > Publish snapshot/release with `-Phive-2.3` only > --- > > Key: SPARK-30007 > URL: https://issues.apache.org/jira/browse/SPARK-30007 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30007) Publish snapshot/release with `-Phive-2.3` only
Dongjoon Hyun created SPARK-30007: - Summary: Publish snapshot/release with `-Phive-2.3` only Key: SPARK-30007 URL: https://issues.apache.org/jira/browse/SPARK-30007 Project: Spark Issue Type: Improvement Components: Project Infra Affects Versions: 3.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-29453) Improve tooltip information for SQL tab
[ https://issues.apache.org/jira/browse/SPARK-29453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Raj Boudh reopened SPARK-29453: - Need to add Tooltips for Duration field > Improve tooltip information for SQL tab > --- > > Key: SPARK-29453 > URL: https://issues.apache.org/jira/browse/SPARK-29453 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Affects Versions: 3.0.0 >Reporter: Sandeep Katta >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29992) Make access to getActive public
[ https://issues.apache.org/jira/browse/SPARK-29992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-29992. --- Resolution: Won't Do I'm resolving this issue as `Won't Do` for now. > Make access to getActive public > --- > > Key: SPARK-29992 > URL: https://issues.apache.org/jira/browse/SPARK-29992 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 2.4.4 >Reporter: Nishkam Ravi >Priority: Major > > Currently a simple get on activeContext is missing. getOrCreate creates a new > Spark context in the absence of an existing one but exposes activeContext > anyway. For use cases that need access to an existing context if one exists > without necessarily wanting to create one would benefit from this being > publicly accessible. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29992) Make access to getActive public
[ https://issues.apache.org/jira/browse/SPARK-29992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980959#comment-16980959 ] Dongjoon Hyun commented on SPARK-29992: --- Thank you for filing a JIRA and making a PR, [~nishkamravi2]. Although it's not merged because Apache Spark community recommends `SparkSession` over `SparkContext`, this issue is a good reference for the others who have similar questions. BTW, there is a community guide for `Fix Version`. - https://spark.apache.org/contributing.html {code} Do not set the following fields: - Fix Version. This is assigned by committers only when resolved. - Target Version. This is assigned by committers to indicate a PR has been accepted for possible fix by the target version. {code} > Make access to getActive public > --- > > Key: SPARK-29992 > URL: https://issues.apache.org/jira/browse/SPARK-29992 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 2.4.4 >Reporter: Nishkam Ravi >Priority: Major > > Currently a simple get on activeContext is missing. getOrCreate creates a new > Spark context in the absence of an existing one but exposes activeContext > anyway. For use cases that need access to an existing context if one exists > without necessarily wanting to create one would benefit from this being > publicly accessible. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29992) Make access to getActive public
[ https://issues.apache.org/jira/browse/SPARK-29992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-29992: -- Fix Version/s: (was: 2.4.5) (was: 3.0.0) > Make access to getActive public > --- > > Key: SPARK-29992 > URL: https://issues.apache.org/jira/browse/SPARK-29992 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 2.4.4 >Reporter: Nishkam Ravi >Priority: Major > > Currently a simple get on activeContext is missing. getOrCreate creates a new > Spark context in the absence of an existing one but exposes activeContext > anyway. For use cases that need access to an existing context if one exists > without necessarily wanting to create one would benefit from this being > publicly accessible. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28812) Document SHOW PARTITIONS in SQL Reference.
[ https://issues.apache.org/jira/browse/SPARK-28812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-28812. --- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 26635 [https://github.com/apache/spark/pull/26635] > Document SHOW PARTITIONS in SQL Reference. > -- > > Key: SPARK-28812 > URL: https://issues.apache.org/jira/browse/SPARK-28812 > Project: Spark > Issue Type: Sub-task > Components: Documentation, SQL >Affects Versions: 2.4.3 >Reporter: Dilip Biswal >Assignee: Dilip Biswal >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28812) Document SHOW PARTITIONS in SQL Reference.
[ https://issues.apache.org/jira/browse/SPARK-28812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-28812: - Assignee: Dilip Biswal > Document SHOW PARTITIONS in SQL Reference. > -- > > Key: SPARK-28812 > URL: https://issues.apache.org/jira/browse/SPARK-28812 > Project: Spark > Issue Type: Sub-task > Components: Documentation, SQL >Affects Versions: 2.4.3 >Reporter: Dilip Biswal >Assignee: Dilip Biswal >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29681) Spark Application UI- environment tab field "value" sort is not working
[ https://issues.apache.org/jira/browse/SPARK-29681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-29681. --- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 26638 [https://github.com/apache/spark/pull/26638] > Spark Application UI- environment tab field "value" sort is not working > > > Key: SPARK-29681 > URL: https://issues.apache.org/jira/browse/SPARK-29681 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.4.4, 3.0.0 >Reporter: jobit mathew >Priority: Major > Fix For: 3.0.0 > > Attachments: Ascend.png, DESCEND.png > > > Spark Application UI- > In environment tab, field "value" sort is not working if we do sort in > ascending or descending order. > !Ascend.png|width=854,height=475! > > !DESCEND.png|width=854,height=470! > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30004) Allow UserDefinedType to be merged into a standard DateType
[ https://issues.apache.org/jira/browse/SPARK-30004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980941#comment-16980941 ] Takeshi Yamamuro commented on SPARK-30004: -- I reset the target version; you don't need to set it cuz that will be filled when resolved. > Allow UserDefinedType to be merged into a standard DateType > --- > > Key: SPARK-30004 > URL: https://issues.apache.org/jira/browse/SPARK-30004 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.4 >Reporter: Fokko Driesprong >Priority: Major > > I've registered a custom type, namely XMLGregorianCalendar which is being > used by Scalaxb. A XML databinding tool, for generating case classes based on > a XSD. I want to convert the XMLGregorianCalendar to a regular TimestampType. > This works, but when I update the table (using Delta), I get an error: > Failed to merge fields 'START_DATE_MAINTENANCE_FLPL' and > 'START_DATE_MAINTENANCE_FLPL'. > Failed to merge incompatible data types TimestampType and > org.apache.spark.sql.types.CustomXMLGregorianCalendarType@5ff12345;; > There are two ways of fixing this: > * Adding a rule which compares the sqlType. > * Change the compare function, so it will check the rhs to the lhs, so I can > override the equals function on the UserDefinedType -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30004) Allow UserDefinedType to be merged into a standard DateType
[ https://issues.apache.org/jira/browse/SPARK-30004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro updated SPARK-30004: - Affects Version/s: (was: 2.4.4) 3.0.0 > Allow UserDefinedType to be merged into a standard DateType > --- > > Key: SPARK-30004 > URL: https://issues.apache.org/jira/browse/SPARK-30004 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Fokko Driesprong >Priority: Major > > I've registered a custom type, namely XMLGregorianCalendar which is being > used by Scalaxb. A XML databinding tool, for generating case classes based on > a XSD. I want to convert the XMLGregorianCalendar to a regular TimestampType. > This works, but when I update the table (using Delta), I get an error: > Failed to merge fields 'START_DATE_MAINTENANCE_FLPL' and > 'START_DATE_MAINTENANCE_FLPL'. > Failed to merge incompatible data types TimestampType and > org.apache.spark.sql.types.CustomXMLGregorianCalendarType@5ff12345;; > There are two ways of fixing this: > * Adding a rule which compares the sqlType. > * Change the compare function, so it will check the rhs to the lhs, so I can > override the equals function on the UserDefinedType -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30004) Allow UserDefinedType to be merged into a standard DateType
[ https://issues.apache.org/jira/browse/SPARK-30004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro updated SPARK-30004: - Fix Version/s: (was: 2.4.5) > Allow UserDefinedType to be merged into a standard DateType > --- > > Key: SPARK-30004 > URL: https://issues.apache.org/jira/browse/SPARK-30004 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.4 >Reporter: Fokko Driesprong >Priority: Major > > I've registered a custom type, namely XMLGregorianCalendar which is being > used by Scalaxb. A XML databinding tool, for generating case classes based on > a XSD. I want to convert the XMLGregorianCalendar to a regular TimestampType. > This works, but when I update the table (using Delta), I get an error: > Failed to merge fields 'START_DATE_MAINTENANCE_FLPL' and > 'START_DATE_MAINTENANCE_FLPL'. > Failed to merge incompatible data types TimestampType and > org.apache.spark.sql.types.CustomXMLGregorianCalendarType@5ff12345;; > There are two ways of fixing this: > * Adding a rule which compares the sqlType. > * Change the compare function, so it will check the rhs to the lhs, so I can > override the equals function on the UserDefinedType -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29970) open/close state is not preserved for Timelineview
[ https://issues.apache.org/jira/browse/SPARK-29970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-29970. --- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 26607 [https://github.com/apache/spark/pull/26607] > open/close state is not preserved for Timelineview > -- > > Key: SPARK-29970 > URL: https://issues.apache.org/jira/browse/SPARK-29970 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.0.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > Fix For: 3.0.0 > > > open/close state for Timelineview is intended to be preserved as well as any > other UI features like DAG Vis but it's not actually preserved. So if we go > back to All Jobs page/Job page/Stage page from another page, Timelineview is > always closed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30006) printSchema indeterministic output
[ https://issues.apache.org/jira/browse/SPARK-30006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hasil Sharma updated SPARK-30006: - Description: printSchema doesn't give a consistent output in following example. {code:python} from pyspark.sql import SparkSession from pyspark.sql import Row spark = SparkSession.builder.appName("new-session").getOrCreate() l = [('Ankit',25),('Jalfaizy',22),('saurabh',20),('Bala',26)] rdd = spark.sparkContext.parallelize(l) people = rdd.map(lambda x: Row(name=x[0], age=int(x[1]))) df1 = spark.createDataFrame(people) print(df1.printSchema()) df2 = df1.select("name", "age") print(df2.printSchema()) {code} first print outputs {noformat} root |– age: long (nullable = true) |– name: string (nullable = true) {noformat} second print outputs {noformat} root |– name: string (nullable = true) |– age: long (nullable = true) {noformat} Expectation: The output should be same because the column names are same. was: printSchema doesn't give a consistent output in following example. {code:python} from pyspark.sql import SparkSession from pyspark.sql import Row spark = SparkSession.builder.appName("new-session").getOrCreate() l = [('Ankit',25),('Jalfaizy',22),('saurabh',20),('Bala',26)] rdd = spark.sparkContext.parallelize(l) people_1 = rdd.map(lambda x: Row(name=x[0], age=int(x[1]))) df1 = spark.createDataFrame(people_1) print(df1.printSchema()) df2 = df1.select("name", "age") print(df2.printSchema()) {code} first print outputs {noformat} root |– age: long (nullable = true) |– name: string (nullable = true) {noformat} second print outputs {noformat} root |– name: string (nullable = true) |– age: long (nullable = true) {noformat} Expectation: The output should be same because the column names are same. > printSchema indeterministic output > -- > > Key: SPARK-30006 > URL: https://issues.apache.org/jira/browse/SPARK-30006 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.4 >Reporter: Hasil Sharma >Priority: Minor > > printSchema doesn't give a consistent output in following example. > > {code:python} > from pyspark.sql import SparkSession > from pyspark.sql import Row > spark = SparkSession.builder.appName("new-session").getOrCreate() > l = [('Ankit',25),('Jalfaizy',22),('saurabh',20),('Bala',26)] > rdd = spark.sparkContext.parallelize(l) > people = rdd.map(lambda x: Row(name=x[0], age=int(x[1]))) > df1 = spark.createDataFrame(people) > print(df1.printSchema()) > df2 = df1.select("name", "age") > print(df2.printSchema()) > {code} > > first print outputs > {noformat} > root > |– age: long (nullable = true) > |– name: string (nullable = true) > {noformat} > > second print outputs > {noformat} > root > |– name: string (nullable = true) > |– age: long (nullable = true) > {noformat} > Expectation: The output should be same because the column names are same. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30006) printSchema indeterministic output
[ https://issues.apache.org/jira/browse/SPARK-30006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hasil Sharma updated SPARK-30006: - Description: printSchema doesn't give a consistent output in following example. {code:python} from pyspark.sql import SparkSession from pyspark.sql import Row spark = SparkSession.builder.appName("new-session").getOrCreate() l = [('Ankit',25),('Jalfaizy',22),('saurabh',20),('Bala',26)] rdd = spark.sparkContext.parallelize(l) people_1 = rdd.map(lambda x: Row(name=x[0], age=int(x[1]))) df1 = spark.createDataFrame(people_1) print(df1.printSchema()) df2 = df1.select("name", "age") print(df2.printSchema()) {code} first print outputs {noformat} root |– age: long (nullable = true) |– name: string (nullable = true) {noformat} second print outputs {noformat} root |– name: string (nullable = true) |– age: long (nullable = true) {noformat} Expectation: The output should be same because the column names are same. was: printSchema doesn't give a consistent output in following example. ```python from pyspark.sql import SparkSession from pyspark.sql import Row spark = SparkSession.builder.appName("new-session").getOrCreate() l = [('Ankit',25),('Jalfaizy',22),('saurabh',20),('Bala',26)] rdd = spark.sparkContext.parallelize(l) people_1 = rdd.map(lambda x: Row(name=x[0], age=int(x[1]))) df1 = spark.createDataFrame(people_1) print(df1.printSchema()) df2 = df1.select("name", "age") print(df2.printSchema()) ``` first print outputs ```root |– age: long (nullable = true)| |– name: string (nullable = true)|``` second print outputs ```root |– name: string (nullable = true)| |– age: long (nullable = true)|``` Expectation: The output should be same because the column names are same. > printSchema indeterministic output > -- > > Key: SPARK-30006 > URL: https://issues.apache.org/jira/browse/SPARK-30006 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.4 >Reporter: Hasil Sharma >Priority: Minor > > printSchema doesn't give a consistent output in following example. > > {code:python} > from pyspark.sql import SparkSession > from pyspark.sql import Row > spark = SparkSession.builder.appName("new-session").getOrCreate() > l = [('Ankit',25),('Jalfaizy',22),('saurabh',20),('Bala',26)] > rdd = spark.sparkContext.parallelize(l) > people_1 = rdd.map(lambda x: Row(name=x[0], age=int(x[1]))) > df1 = spark.createDataFrame(people_1) > print(df1.printSchema()) > df2 = df1.select("name", "age") > print(df2.printSchema()) > {code} > > first print outputs > {noformat} > root > |– age: long (nullable = true) > |– name: string (nullable = true) > {noformat} > > second print outputs > {noformat} > root > |– name: string (nullable = true) > |– age: long (nullable = true) > {noformat} > Expectation: The output should be same because the column names are same. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30006) printSchema indeterministic output
[ https://issues.apache.org/jira/browse/SPARK-30006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hasil Sharma updated SPARK-30006: - Description: printSchema doesn't give a consistent output in following example. ```python from pyspark.sql import SparkSession from pyspark.sql import Row spark = SparkSession.builder.appName("new-session").getOrCreate() l = [('Ankit',25),('Jalfaizy',22),('saurabh',20),('Bala',26)] rdd = spark.sparkContext.parallelize(l) people_1 = rdd.map(lambda x: Row(name=x[0], age=int(x[1]))) df1 = spark.createDataFrame(people_1) print(df1.printSchema()) df2 = df1.select("name", "age") print(df2.printSchema()) ``` first print outputs ```root |– age: long (nullable = true)| |– name: string (nullable = true)|``` second print outputs ```root |– name: string (nullable = true)| |– age: long (nullable = true)|``` Expectation: The output should be same because the column names are same. was: printSchema doesn't give a consistent output in following example. ```python from pyspark.sql import SparkSession from pyspark.sql import Row spark = SparkSession.builder.appName("new-session").getOrCreate() l = [('Ankit',25),('Jalfaizy',22),('saurabh',20),('Bala',26)] rdd = spark.sparkContext.parallelize(l) people_1 = rdd.map(lambda x: Row(name=x[0], age=int(x[1]))) df1 = spark.createDataFrame(people_1) print(df1.printSchema()) df2 = df1.select("name", "age") print(df2.printSchema())``` first print outputs ``` ``` second print outputs ``` root |– name: string (nullable = true)| |– age: long (nullable = true)| ``` Expectation: The output should be same because the column names are same. > printSchema indeterministic output > -- > > Key: SPARK-30006 > URL: https://issues.apache.org/jira/browse/SPARK-30006 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.4 >Reporter: Hasil Sharma >Priority: Minor > > printSchema doesn't give a consistent output in following example. > > ```python > from pyspark.sql import SparkSession > from pyspark.sql import Row > spark = SparkSession.builder.appName("new-session").getOrCreate() > l = [('Ankit',25),('Jalfaizy',22),('saurabh',20),('Bala',26)] > rdd = spark.sparkContext.parallelize(l) > people_1 = rdd.map(lambda x: Row(name=x[0], age=int(x[1]))) > df1 = spark.createDataFrame(people_1) > print(df1.printSchema()) > df2 = df1.select("name", "age") > print(df2.printSchema()) > ``` > > first print outputs > ```root > |– age: long (nullable = true)| > |– name: string (nullable = true)|``` > > second print outputs > ```root > |– name: string (nullable = true)| > |– age: long (nullable = true)|``` > Expectation: The output should be same because the column names are same. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30006) printSchema indeterministic output
[ https://issues.apache.org/jira/browse/SPARK-30006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hasil Sharma updated SPARK-30006: - Description: printSchema doesn't give a consistent output in following example. ```python from pyspark.sql import SparkSession from pyspark.sql import Row spark = SparkSession.builder.appName("new-session").getOrCreate() l = [('Ankit',25),('Jalfaizy',22),('saurabh',20),('Bala',26)] rdd = spark.sparkContext.parallelize(l) people_1 = rdd.map(lambda x: Row(name=x[0], age=int(x[1]))) df1 = spark.createDataFrame(people_1) print(df1.printSchema()) df2 = df1.select("name", "age") print(df2.printSchema())``` first print outputs ``` ``` second print outputs ``` root |– name: string (nullable = true)| |– age: long (nullable = true)| ``` Expectation: The output should be same because the column names are same. was: printSchema doesn't give a consistent output in following example. ```python from pyspark.sql import SparkSession from pyspark.sql import Row spark = SparkSession.builder.appName("new-session").getOrCreate() l = [('Ankit',25),('Jalfaizy',22),('saurabh',20),('Bala',26)] rdd = spark.sparkContext.parallelize(l) people_1 = rdd.map(lambda x: Row(name=x[0], age=int(x[1]))) df1 = spark.createDataFrame(people_1) print(df1.printSchema()) df2 = df1.select("name", "age") print(df2.printSchema()) ``` first print outputs ``` root |-- age: long (nullable = true) |-- name: string (nullable = true) ``` second print outputs ``` root |-- name: string (nullable = true) |-- age: long (nullable = true) ``` Expectation: The output should be same because the column names are same. > printSchema indeterministic output > -- > > Key: SPARK-30006 > URL: https://issues.apache.org/jira/browse/SPARK-30006 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.4 >Reporter: Hasil Sharma >Priority: Minor > > printSchema doesn't give a consistent output in following example. > > ```python > from pyspark.sql import SparkSession > from pyspark.sql import Row > spark = SparkSession.builder.appName("new-session").getOrCreate() > l = [('Ankit',25),('Jalfaizy',22),('saurabh',20),('Bala',26)] > rdd = spark.sparkContext.parallelize(l) > people_1 = rdd.map(lambda x: Row(name=x[0], age=int(x[1]))) > df1 = spark.createDataFrame(people_1) > print(df1.printSchema()) > df2 = df1.select("name", "age") > print(df2.printSchema())``` > > first print outputs > ``` > > ``` > > second print outputs > ``` > root > |– name: string (nullable = true)| > |– age: long (nullable = true)| > ``` > Expectation: The output should be same because the column names are same. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30006) printSchema indeterministic output
Hasil Sharma created SPARK-30006: Summary: printSchema indeterministic output Key: SPARK-30006 URL: https://issues.apache.org/jira/browse/SPARK-30006 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.4 Reporter: Hasil Sharma printSchema doesn't give a consistent output in following example. ```python from pyspark.sql import SparkSession from pyspark.sql import Row spark = SparkSession.builder.appName("new-session").getOrCreate() l = [('Ankit',25),('Jalfaizy',22),('saurabh',20),('Bala',26)] rdd = spark.sparkContext.parallelize(l) people_1 = rdd.map(lambda x: Row(name=x[0], age=int(x[1]))) df1 = spark.createDataFrame(people_1) print(df1.printSchema()) df2 = df1.select("name", "age") print(df2.printSchema()) ``` first print outputs ``` root |-- age: long (nullable = true) |-- name: string (nullable = true) ``` second print outputs ``` root |-- name: string (nullable = true) |-- age: long (nullable = true) ``` Expectation: The output should be same because the column names are same. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29988) Adjust Jenkins jobs for `hive-1.2/2.3` combination
[ https://issues.apache.org/jira/browse/SPARK-29988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-29988: -- Description: We need to rename the following Jenkins jobs first. spark-master-test-sbt-hadoop-2.7 -> spark-master-test-sbt-hadoop-2.7-hive-1.2 spark-master-test-sbt-hadoop-3.2 -> spark-master-test-sbt-hadoop-3.2-hive-2.3 spark-master-test-maven-hadoop-2.7 -> spark-master-test-maven-hadoop-2.7-hive-1.2 spark-master-test-maven-hadoop-3.2 -> spark-master-test-maven-hadoop-3.2-hive-2.3 Also, we need to add `-Phive-1.2` for the existing `hadoop-2.7` jobs. {code} -Phive \ +-Phive-1.2 \ {code} And, we need to add `-Phive-2.3` for the existing `hadoop-3.2` jobs. {code} -Phive \ +-Phive-2.3 \ {code} Now now, I added the above `-Phive-1.2` and `-Phive-2.3` to the Jenkins manually. (This should be added to SCM of AmpLab Jenkins.) After SPARK-29981, we need to create two new jobs. - spark-master-test-sbt-hadoop-2.7-hive-2.3 - spark-master-test-maven-hadoop-2.7-hive-2.3 This is for preparation for Apache Spark 3.0.0. We may drop all `*-hive-1.2` jobs at Apache Spark 3.1.0. was: We need to rename the following Jenkins jobs first. spark-master-test-sbt-hadoop-2.7 -> spark-master-test-sbt-hadoop-2.7-hive-1.2 spark-master-test-sbt-hadoop-3.2 -> spark-master-test-sbt-hadoop-3.2-hive-2.3 spark-master-test-maven-hadoop-2.7 -> spark-master-test-maven-hadoop-2.7-hive-1.2 spark-master-test-maven-hadoop-3.2 -> spark-master-test-maven-hadoop-3.2-hive-2.3 After SPARK-29981, we need to create two new jobs. - spark-master-test-sbt-hadoop-2.7-hive-2.3 - spark-master-test-maven-hadoop-2.7-hive-2.3 This is for preparation for Apache Spark 3.0.0. We may drop all `*-hive-1.2` jobs at Apache Spark 3.1.0. > Adjust Jenkins jobs for `hive-1.2/2.3` combination > -- > > Key: SPARK-29988 > URL: https://issues.apache.org/jira/browse/SPARK-29988 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Assignee: Shane Knapp >Priority: Major > > We need to rename the following Jenkins jobs first. > spark-master-test-sbt-hadoop-2.7 -> spark-master-test-sbt-hadoop-2.7-hive-1.2 > spark-master-test-sbt-hadoop-3.2 -> spark-master-test-sbt-hadoop-3.2-hive-2.3 > spark-master-test-maven-hadoop-2.7 -> > spark-master-test-maven-hadoop-2.7-hive-1.2 > spark-master-test-maven-hadoop-3.2 -> > spark-master-test-maven-hadoop-3.2-hive-2.3 > Also, we need to add `-Phive-1.2` for the existing `hadoop-2.7` jobs. > {code} > -Phive \ > +-Phive-1.2 \ > {code} > And, we need to add `-Phive-2.3` for the existing `hadoop-3.2` jobs. > {code} > -Phive \ > +-Phive-2.3 \ > {code} > Now now, I added the above `-Phive-1.2` and `-Phive-2.3` to the Jenkins > manually. (This should be added to SCM of AmpLab Jenkins.) > After SPARK-29981, we need to create two new jobs. > - spark-master-test-sbt-hadoop-2.7-hive-2.3 > - spark-master-test-maven-hadoop-2.7-hive-2.3 > This is for preparation for Apache Spark 3.0.0. > We may drop all `*-hive-1.2` jobs at Apache Spark 3.1.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30005) Update `test-dependencies.sh` to check `hive-1.2/2.3` profile
Dongjoon Hyun created SPARK-30005: - Summary: Update `test-dependencies.sh` to check `hive-1.2/2.3` profile Key: SPARK-30005 URL: https://issues.apache.org/jira/browse/SPARK-30005 Project: Spark Issue Type: Improvement Components: Project Infra Affects Versions: 3.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30004) Allow UserDefinedType to be merged into a standard DateType
[ https://issues.apache.org/jira/browse/SPARK-30004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong updated SPARK-30004: - Component/s: (was: Spark Core) SQL > Allow UserDefinedType to be merged into a standard DateType > --- > > Key: SPARK-30004 > URL: https://issues.apache.org/jira/browse/SPARK-30004 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.4 >Reporter: Fokko Driesprong >Priority: Major > Fix For: 2.4.5 > > > I've registered a custom type, namely XMLGregorianCalendar which is being > used by Scalaxb. A XML databinding tool, for generating case classes based on > a XSD. I want to convert the XMLGregorianCalendar to a regular TimestampType. > This works, but when I update the table (using Delta), I get an error: > Failed to merge fields 'START_DATE_MAINTENANCE_FLPL' and > 'START_DATE_MAINTENANCE_FLPL'. > Failed to merge incompatible data types TimestampType and > org.apache.spark.sql.types.CustomXMLGregorianCalendarType@5ff12345;; > There are two ways of fixing this: > * Adding a rule which compares the sqlType. > * Change the compare function, so it will check the rhs to the lhs, so I can > override the equals function on the UserDefinedType -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29931) Declare all SQL legacy configs as will be removed in Spark 4.0
[ https://issues.apache.org/jira/browse/SPARK-29931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-29931. --- Resolution: Later > Declare all SQL legacy configs as will be removed in Spark 4.0 > -- > > Key: SPARK-29931 > URL: https://issues.apache.org/jira/browse/SPARK-29931 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Priority: Minor > > Add the sentence to descriptions of all legacy SQL configs existed before > Spark 3.0: "This config will be removed in Spark 4.0.". Here is the list of > such configs: > * spark.sql.legacy.execution.pandas.groupedMap.assignColumnsByName > * spark.sql.legacy.literal.pickMinimumPrecision > * spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation > * spark.sql.legacy.sizeOfNull > * spark.sql.legacy.replaceDatabricksSparkAvro.enabled > * spark.sql.legacy.setopsPrecedence.enabled > * spark.sql.legacy.integralDivide.returnBigint > * spark.sql.legacy.bucketedTableScan.outputOrdering > * spark.sql.legacy.parser.havingWithoutGroupByAsWhere > * spark.sql.legacy.dataset.nameNonStructGroupingKeyAsValue > * spark.sql.legacy.setCommandRejectsSparkCoreConfs > * spark.sql.legacy.utcTimestampFunc.enabled > * spark.sql.legacy.typeCoercion.datetimeToString > * spark.sql.legacy.looseUpcast > * spark.sql.legacy.ctePrecedence.enabled > * spark.sql.legacy.arrayExistsFollowsThreeValuedLogic -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-29981) Add `hive-1.2` and `hive-2.3` profiles
[ https://issues.apache.org/jira/browse/SPARK-29981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-29981: - Assignee: Dongjoon Hyun > Add `hive-1.2` and `hive-2.3` profiles > -- > > Key: SPARK-29981 > URL: https://issues.apache.org/jira/browse/SPARK-29981 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29981) Add `hive-1.2` and `hive-2.3` profiles
[ https://issues.apache.org/jira/browse/SPARK-29981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-29981. --- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 26619 [https://github.com/apache/spark/pull/26619] > Add `hive-1.2` and `hive-2.3` profiles > -- > > Key: SPARK-29981 > URL: https://issues.apache.org/jira/browse/SPARK-29981 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30004) Allow UserDefinedType to be merged into a standard DateType
Fokko Driesprong created SPARK-30004: Summary: Allow UserDefinedType to be merged into a standard DateType Key: SPARK-30004 URL: https://issues.apache.org/jira/browse/SPARK-30004 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 2.4.4 Reporter: Fokko Driesprong Fix For: 2.4.5 I've registered a custom type, namely XMLGregorianCalendar which is being used by Scalaxb. A XML databinding tool, for generating case classes based on a XSD. I want to convert the XMLGregorianCalendar to a regular TimestampType. This works, but when I update the table (using Delta), I get an error: Failed to merge fields 'START_DATE_MAINTENANCE_FLPL' and 'START_DATE_MAINTENANCE_FLPL'. Failed to merge incompatible data types TimestampType and org.apache.spark.sql.types.CustomXMLGregorianCalendarType@5ff12345;; There are two ways of fixing this: * Adding a rule which compares the sqlType. * Change the compare function, so it will check the rhs to the lhs, so I can override the equals function on the UserDefinedType -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-30003) Prevent stack over flow in unknown hint resolution
[ https://issues.apache.org/jira/browse/SPARK-30003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-30003. -- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 26642 [https://github.com/apache/spark/pull/26642] > Prevent stack over flow in unknown hint resolution > -- > > Key: SPARK-30003 > URL: https://issues.apache.org/jira/browse/SPARK-30003 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.0.0 > > > This was caused by SPARK-28746. See > https://github.com/apache/spark/pull/25464/files#r349543286 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-30003) Prevent stack over flow in unknown hint resolution
[ https://issues.apache.org/jira/browse/SPARK-30003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-30003: Assignee: Hyukjin Kwon > Prevent stack over flow in unknown hint resolution > -- > > Key: SPARK-30003 > URL: https://issues.apache.org/jira/browse/SPARK-30003 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > > This was caused by SPARK-28746. See > https://github.com/apache/spark/pull/25464/files#r349543286 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org