[jira] [Commented] (SPARK-34276) Check the unreleased/unresolved JIRAs/PRs of Parquet 1.11 and 1.12

2021-10-07 Thread Micah Kornfield (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425959#comment-17425959
 ] 

Micah Kornfield commented on SPARK-34276:
-

Sorry for the late reply.  PARQUET-2089 has been a long standing bug in the C++ 
implementation where we were setting file_offset to the beginning of 
column_chunk metatadata and not the actual data page.  It's not clear to me if 
this was a problem before parquet-mr 1.12 in practice.  [~gershinsky] Would the 
fix in PARQUET-2078 make parquet-mr resilient to this bug?

> Check the unreleased/unresolved JIRAs/PRs of Parquet 1.11 and 1.12
> --
>
> Key: SPARK-34276
> URL: https://issues.apache.org/jira/browse/SPARK-34276
> Project: Spark
>  Issue Type: Task
>  Components: Build, SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Assignee: Chao Sun
>Priority: Blocker
>
> Before the release, we need to double check the unreleased/unresolved 
> JIRAs/PRs of Parquet 1.11/1.12 and then decide whether we should 
> upgrade/revert Parquet. At the same time, we should encourage the whole 
> community to do the compatibility and performance tests for their production 
> workloads, including both read and write code paths.
> More details: 
> [https://github.com/apache/spark/pull/26804#issuecomment-768790620]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35531) Can not insert into hive bucket table if create table with upper case schema

2021-10-07 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-35531:
---
Affects Version/s: 3.0.0
   3.1.1

> Can not insert into hive bucket table if create table with upper case schema
> 
>
> Key: SPARK-35531
> URL: https://issues.apache.org/jira/browse/SPARK-35531
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.1, 3.2.0
>Reporter: Hongyi Zhang
>Priority: Major
>
>  
>  
> create table TEST1(
>  V1 BIGINT,
>  S1 INT)
>  partitioned by (PK BIGINT)
>  clustered by (V1)
>  sorted by (S1)
>  into 200 buckets
>  STORED AS PARQUET;
>  
> insert into test1
>  select
>  * from values(1,1,1);
>  
>  
> org.apache.hadoop.hive.ql.metadata.HiveException: Bucket columns V1 is not 
> part of the table columns ([FieldSchema(name:v1, type:bigint, comment:null), 
> FieldSchema(name:s1, type:int, comment:null)]
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Bucket columns V1 is not 
> part of the table columns ([FieldSchema(name:v1, type:bigint, comment:null), 
> FieldSchema(name:s1, type:int, comment:null)]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35531) Can not insert into hive bucket table if create table with upper case schema

2021-10-07 Thread Gengliang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425954#comment-17425954
 ] 

Gengliang Wang commented on SPARK-35531:


I can reproduce the issue on 3.0.0 and 3.1.1. 
It's a long-standing bug.

> Can not insert into hive bucket table if create table with upper case schema
> 
>
> Key: SPARK-35531
> URL: https://issues.apache.org/jira/browse/SPARK-35531
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Hongyi Zhang
>Priority: Major
>
>  
>  
> create table TEST1(
>  V1 BIGINT,
>  S1 INT)
>  partitioned by (PK BIGINT)
>  clustered by (V1)
>  sorted by (S1)
>  into 200 buckets
>  STORED AS PARQUET;
>  
> insert into test1
>  select
>  * from values(1,1,1);
>  
>  
> org.apache.hadoop.hive.ql.metadata.HiveException: Bucket columns V1 is not 
> part of the table columns ([FieldSchema(name:v1, type:bigint, comment:null), 
> FieldSchema(name:s1, type:int, comment:null)]
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Bucket columns V1 is not 
> part of the table columns ([FieldSchema(name:v1, type:bigint, comment:null), 
> FieldSchema(name:s1, type:int, comment:null)]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36952) Inline type hints for python/pyspark/resource/information.py and python/pyspark/resource/profile.py

2021-10-07 Thread dch nguyen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425949#comment-17425949
 ] 

dch nguyen commented on SPARK-36952:


working on this

> Inline type hints for python/pyspark/resource/information.py and 
> python/pyspark/resource/profile.py
> ---
>
> Key: SPARK-36952
> URL: https://issues.apache.org/jira/browse/SPARK-36952
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dgd_contributor
>Priority: Major
>
> Inline type hints for python/pyspark/resource/information.py and 
> python/pyspark/resource/profile.py



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36952) Inline type hints for python/pyspark/resource/information.py and python/pyspark/resource/profile.py

2021-10-07 Thread dgd_contributor (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425947#comment-17425947
 ] 

dgd_contributor commented on SPARK-36952:
-

working on this

> Inline type hints for python/pyspark/resource/information.py and 
> python/pyspark/resource/profile.py
> ---
>
> Key: SPARK-36952
> URL: https://issues.apache.org/jira/browse/SPARK-36952
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dgd_contributor
>Priority: Major
>
> Inline type hints for python/pyspark/resource/information.py and 
> python/pyspark/resource/profile.py



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-36952) Inline type hints for python/pyspark/resource/information.py and python/pyspark/resource/profile.py

2021-10-07 Thread dgd_contributor (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dgd_contributor updated SPARK-36952:

Comment: was deleted

(was: working on this)

> Inline type hints for python/pyspark/resource/information.py and 
> python/pyspark/resource/profile.py
> ---
>
> Key: SPARK-36952
> URL: https://issues.apache.org/jira/browse/SPARK-36952
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dgd_contributor
>Priority: Major
>
> Inline type hints for python/pyspark/resource/information.py and 
> python/pyspark/resource/profile.py



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36953) Expose SQL state and error class in PySpark exceptions

2021-10-07 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425945#comment-17425945
 ] 

Apache Spark commented on SPARK-36953:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/34219

> Expose SQL state and error class in PySpark exceptions
> --
>
> Key: SPARK-36953
> URL: https://issues.apache.org/jira/browse/SPARK-36953
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 3.3.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> SPARK-34920 introduced error classs and states but they are not accessible in 
> PySpark. We should make both available in PySpark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36953) Expose SQL state and error class in PySpark exceptions

2021-10-07 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36953:


Assignee: (was: Apache Spark)

> Expose SQL state and error class in PySpark exceptions
> --
>
> Key: SPARK-36953
> URL: https://issues.apache.org/jira/browse/SPARK-36953
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 3.3.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> SPARK-34920 introduced error classs and states but they are not accessible in 
> PySpark. We should make both available in PySpark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36953) Expose SQL state and error class in PySpark exceptions

2021-10-07 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36953:


Assignee: Apache Spark

> Expose SQL state and error class in PySpark exceptions
> --
>
> Key: SPARK-36953
> URL: https://issues.apache.org/jira/browse/SPARK-36953
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 3.3.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>
> SPARK-34920 introduced error classs and states but they are not accessible in 
> PySpark. We should make both available in PySpark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36953) Expose SQL state and error class in PySpark exceptions

2021-10-07 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425944#comment-17425944
 ] 

Apache Spark commented on SPARK-36953:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/34219

> Expose SQL state and error class in PySpark exceptions
> --
>
> Key: SPARK-36953
> URL: https://issues.apache.org/jira/browse/SPARK-36953
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 3.3.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> SPARK-34920 introduced error classs and states but they are not accessible in 
> PySpark. We should make both available in PySpark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36953) Expose SQL state and error class in PySpark exceptions

2021-10-07 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-36953:


 Summary: Expose SQL state and error class in PySpark exceptions
 Key: SPARK-36953
 URL: https://issues.apache.org/jira/browse/SPARK-36953
 Project: Spark
  Issue Type: Improvement
  Components: PySpark, SQL
Affects Versions: 3.3.0
Reporter: Hyukjin Kwon


SPARK-34920 introduced error classs and states but they are not accessible in 
PySpark. We should make both available in PySpark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35531) Can not insert into hive bucket table if create table with upper case schema

2021-10-07 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425932#comment-17425932
 ] 

Apache Spark commented on SPARK-35531:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/34218

> Can not insert into hive bucket table if create table with upper case schema
> 
>
> Key: SPARK-35531
> URL: https://issues.apache.org/jira/browse/SPARK-35531
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Hongyi Zhang
>Priority: Major
>
>  
>  
> create table TEST1(
>  V1 BIGINT,
>  S1 INT)
>  partitioned by (PK BIGINT)
>  clustered by (V1)
>  sorted by (S1)
>  into 200 buckets
>  STORED AS PARQUET;
>  
> insert into test1
>  select
>  * from values(1,1,1);
>  
>  
> org.apache.hadoop.hive.ql.metadata.HiveException: Bucket columns V1 is not 
> part of the table columns ([FieldSchema(name:v1, type:bigint, comment:null), 
> FieldSchema(name:s1, type:int, comment:null)]
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Bucket columns V1 is not 
> part of the table columns ([FieldSchema(name:v1, type:bigint, comment:null), 
> FieldSchema(name:s1, type:int, comment:null)]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36952) Inline type hints for python/pyspark/resource/information.py and python/pyspark/resource/profile.py

2021-10-07 Thread dgd_contributor (Jira)
dgd_contributor created SPARK-36952:
---

 Summary: Inline type hints for 
python/pyspark/resource/information.py and python/pyspark/resource/profile.py
 Key: SPARK-36952
 URL: https://issues.apache.org/jira/browse/SPARK-36952
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.3.0
Reporter: dgd_contributor


Inline type hints for python/pyspark/resource/information.py and 
python/pyspark/resource/profile.py



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36903) oom exception occurred during code generation due to a large number of case when branches

2021-10-07 Thread JacobZheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

JacobZheng updated SPARK-36903:
---
Description: 
I have a spark task that contains many case when branches. When I run it, the 
driver throws an oom exception in the codegen phase. What I expect is if it is 
possible to detect or limit it in the codegen phase to avoid this.

 

I see that spark 2.2 has a configuration item 
spark.sql.codegen.maxCaseBranches. would it help my situation if I try to add 
this limit back?

 

This is the stack information I see via jstack
{code:java}
"SparkJobEngine-akka.actor.default-dispatcher-9" #23010 prio=5 os_prio=0 
cpu=197487.25ms elapsed=7213.71s tid=0x7fb08c019800 nid=0x5fb9 runnable 
[0x7fb072af2000] java.lang.Thread.State: RUNNABLE at 
scala.collection.immutable.StringLike$$Lambda$1790/0x000840ee4840.apply(Unknown
 Source) at scala.collection.Iterator.foreach(Iterator.scala:941) at 
scala.collection.Iterator.foreach$(Iterator.scala:941) at 
scala.collection.AbstractIterator.foreach(Iterator.scala:1429) at 
scala.collection.immutable.StringLike.stripMargin(StringLike.scala:187) at 
scala.collection.immutable.StringLike.stripMargin$(StringLike.scala:185) at 
scala.collection.immutable.StringOps.stripMargin(StringOps.scala:33) at 
org.apache.spark.sql.catalyst.expressions.codegen.Block.toString(javaCode.scala:142)
 at 
org.apache.spark.sql.catalyst.expressions.codegen.Block.toString$(javaCode.scala:141)
 at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeBlock.toString(javaCode.scala:286)
 at 
org.apache.spark.sql.catalyst.expressions.codegen.Block.length(javaCode.scala:149)
 at 
org.apache.spark.sql.catalyst.expressions.codegen.Block.length$(javaCode.scala:149)
 at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeBlock.length(javaCode.scala:286)
 at 
org.apache.spark.sql.catalyst.expressions.Expression.reduceCodeSize(Expression.scala:160)
 at 
org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:147)
 at 
org.apache.spark.sql.catalyst.expressions.Expression$$Lambda$2784/0x00084131b840.apply(Unknown
 Source) at scala.Option.getOrElse(Option.scala:189) at 
org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:141)
 at 
org.apache.spark.sql.catalyst.expressions.And.doGenCode(predicates.scala:567) 
at 
org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:146)
 at 
org.apache.spark.sql.catalyst.expressions.Expression$$Lambda$2784/0x00084131b840.apply(Unknown
 Source) at scala.Option.getOrElse(Option.scala:189) at 
org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:141)
 at 
org.apache.spark.sql.catalyst.expressions.CaseWhen.$anonfun$multiBranchesCodegen$1(conditionalExpressions.scala:209)
 at 
org.apache.spark.sql.catalyst.expressions.CaseWhen$$Lambda$4626/0x0008415b8840.apply(Unknown
 Source) at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) at 
scala.collection.TraversableLike$$Lambda$83/0x0008401bc040.apply(Unknown 
Source) at 
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at 
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at 
scala.collection.TraversableLike.map(TraversableLike.scala:238) at 
scala.collection.TraversableLike.map$(TraversableLike.scala:231) at 
scala.collection.AbstractTraversable.map(Traversable.scala:108) at 
org.apache.spark.sql.catalyst.expressions.CaseWhen.multiBranchesCodegen(conditionalExpressions.scala:208)
 at 
org.apache.spark.sql.catalyst.expressions.CaseWhen.doGenCode(conditionalExpressions.scala:291)
 at 
org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:146)
 at 
org.apache.spark.sql.catalyst.expressions.Expression$$Lambda$2784/0x00084131b840.apply(Unknown
 Source) at scala.Option.getOrElse(Option.scala:189) at 
org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:141)
 at 
org.apache.spark.sql.catalyst.expressions.Concat.$anonfun$doGenCode$22(collectionOperations.scala:2120)
 at 
org.apache.spark.sql.catalyst.expressions.Concat$$Lambda$5022/0x000841a60840.apply(Unknown
 Source) at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) at 
scala.collection.TraversableLike$$Lambda$83/0x0008401bc040.apply(Unknown 
Source) at 
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at 
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at 
scala.collection.TraversableLike.map(TraversableLike.scala:238) at 
scala.collection.TraversableLike.map$(TraversableLike.scala:231) at 
scala.collection.AbstractTraversable.map(Traversable.scala:108) at 
org.apache.spark.sql.cata

[jira] [Updated] (SPARK-36903) oom exception occurred during code generation due to a large number of case when branches

2021-10-07 Thread JacobZheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

JacobZheng updated SPARK-36903:
---
Description: 
I have a spark task that contains many case when branches. When I run it, the 
driver throws an oom exception in the codegen phase. What I expect is if it is 
possible to detect or limit it in the codegen phase to avoid this.

 

I see that spark 2.2 has a configuration item 
spark.sql.codegen.maxCaseBranches. would it help my situation if I try to add 
this limit back?

 

This is the stack information I see via jstack
{code:java}
"SparkJobEngine-akka.actor.default-dispatcher-9" #23010 prio=5 os_prio=0 
cpu=197487.25ms elapsed=7213.71s tid=0x7fb08c019800 nid=0x5fb9 runnable 
[0x7fb072af2000]  java.lang.Thread.State: RUNNABLE  at 
scala.collection.immutable.StringLike$$Lambda$1790/0x000840ee4840.apply(Unknown
 Source)  at scala.collection.Iterator.foreach(Iterator.scala:941)  at 
scala.collection.Iterator.foreach$(Iterator.scala:941)  at 
scala.collection.AbstractIterator.foreach(Iterator.scala:1429)  at 
scala.collection.immutable.StringLike.stripMargin(StringLike.scala:187)  at 
scala.collection.immutable.StringLike.stripMargin$(StringLike.scala:185)  at 
scala.collection.immutable.StringOps.stripMargin(StringOps.scala:33)  at 
org.apache.spark.sql.catalyst.expressions.codegen.Block.toString(javaCode.scala:142)
  at 
org.apache.spark.sql.catalyst.expressions.codegen.Block.toString$(javaCode.scala:141)
  at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeBlock.toString(javaCode.scala:286)
  at 
org.apache.spark.sql.catalyst.expressions.codegen.Block.length(javaCode.scala:149)
  at 
org.apache.spark.sql.catalyst.expressions.codegen.Block.length$(javaCode.scala:149)
  at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeBlock.length(javaCode.scala:286)
  at 
org.apache.spark.sql.catalyst.expressions.Expression.reduceCodeSize(Expression.scala:160)
  at 
org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:147)
  at 
org.apache.spark.sql.catalyst.expressions.Expression$$Lambda$2784/0x00084131b840.apply(Unknown
 Source)  at scala.Option.getOrElse(Option.scala:189)  at 
org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:141)
  at 
org.apache.spark.sql.catalyst.expressions.And.doGenCode(predicates.scala:567)  
at 
org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:146)
  at 
org.apache.spark.sql.catalyst.expressions.Expression$$Lambda$2784/0x00084131b840.apply(Unknown
 Source)  at scala.Option.getOrElse(Option.scala:189)  at 
org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:141)
  at 
org.apache.spark.sql.catalyst.expressions.CaseWhen.$anonfun$multiBranchesCodegen$1(conditionalExpressions.scala:209)
  at 
org.apache.spark.sql.catalyst.expressions.CaseWhen$$Lambda$4626/0x0008415b8840.apply(Unknown
 Source)  at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)  at 
scala.collection.TraversableLike$$Lambda$83/0x0008401bc040.apply(Unknown 
Source)  at 
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)  at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)  at 
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)  at 
scala.collection.TraversableLike.map(TraversableLike.scala:238)  at 
scala.collection.TraversableLike.map$(TraversableLike.scala:231)  at 
scala.collection.AbstractTraversable.map(Traversable.scala:108)  at 
org.apache.spark.sql.catalyst.expressions.CaseWhen.multiBranchesCodegen(conditionalExpressions.scala:208)
  at 
org.apache.spark.sql.catalyst.expressions.CaseWhen.doGenCode(conditionalExpressions.scala:291)
  at 
org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:146)
  at 
org.apache.spark.sql.catalyst.expressions.Expression$$Lambda$2784/0x00084131b840.apply(Unknown
 Source)  at scala.Option.getOrElse(Option.scala:189)  at 
org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:141)
  at 
org.apache.spark.sql.catalyst.expressions.Concat.$anonfun$doGenCode$22(collectionOperations.scala:2120)
  at 
org.apache.spark.sql.catalyst.expressions.Concat$$Lambda$5022/0x000841a60840.apply(Unknown
 Source)  at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)  at 
scala.collection.TraversableLike$$Lambda$83/0x0008401bc040.apply(Unknown 
Source)  at 
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)  at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)  at 
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)  at 
scala.collection.TraversableLike.map(TraversableLike.scala:238)  at 
scala.collection.TraversableLike.map$(TraversableLike.scala:231)  at 
scala.collection.AbstractTraversable.map(Tr

[jira] [Commented] (SPARK-36839) Add daily build with Hadoop 2 profile in GitHub Actions build

2021-10-07 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425879#comment-17425879
 ] 

Apache Spark commented on SPARK-36839:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/34217

> Add daily build with Hadoop 2 profile in GitHub Actions build
> -
>
> Key: SPARK-36839
> URL: https://issues.apache.org/jira/browse/SPARK-36839
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra
>Affects Versions: 3.3.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.3.0
>
>
> We have faced problems such as SPARK-36820 due to missing build with Hadoop 2 
> profile. We should at least add a daily build in GitHub Actions for that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36839) Add daily build with Hadoop 2 profile in GitHub Actions build

2021-10-07 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425878#comment-17425878
 ] 

Apache Spark commented on SPARK-36839:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/34217

> Add daily build with Hadoop 2 profile in GitHub Actions build
> -
>
> Key: SPARK-36839
> URL: https://issues.apache.org/jira/browse/SPARK-36839
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra
>Affects Versions: 3.3.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.3.0
>
>
> We have faced problems such as SPARK-36820 due to missing build with Hadoop 2 
> profile. We should at least add a daily build in GitHub Actions for that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36950) Normalize semi-structured data into a flat table.

2021-10-07 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425873#comment-17425873
 ] 

Hyukjin Kwon commented on SPARK-36950:
--

Thanks [~bjornjorgensen]

> Normalize semi-structured data into a flat table.
> -
>
> Key: SPARK-36950
> URL: https://issues.apache.org/jira/browse/SPARK-36950
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Bjørn Jørgensen
>Priority: Major
>
> Hi, in pandas there is this json_normalize that flat out nested data.
> https://github.com/pandas-dev/pandas/blob/v1.3.3/pandas/io/json/_normalize.py#L112-L353
>  
> I have opened a request for this function at koalas. Now there are more 
> people that will have some function over to pyspark.
> https://github.com/databricks/koalas/issues/2162
> This is also a function that geopandas are using. In the meantime I have 
> found a gist that has code that flattens out the whole dataframe.
> https://gist.github.com/nmukerje/e65cde41be85470e4b8dfd9a2d6aed50 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36947) Exception when trying to access Row field using getAs method

2021-10-07 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-36947:
-
Priority: Major  (was: Blocker)

> Exception when trying to access Row field using getAs method
> 
>
> Key: SPARK-36947
> URL: https://issues.apache.org/jira/browse/SPARK-36947
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
> Environment: Spark 3.1.2 (but this also may affect other versions as 
> well)
>Reporter: Alexandros Mavrommatis
>Priority: Major
>  Labels: catalyst, row, sql
>
> I have an input dataframe *df* with the following schema:
> {code:java}
> |-- origin: string (nullable = true)
> |-- product: struct (nullable = true)
> ||-- id: integer (nullable = true){code}
>  
> when I try to select the first 20 rows of the id column I execute:
> {code:java}
> df.select("product.id").show(20, false)
> {code}
>  
> and I manage to get the result. But when I execute the following: 
> {code:java}
> df.map(_.getAs[Int]("product.id")).show(20, false){code}
>  
> I get the following error:
> {code:java}
> java.lang.IllegalArgumentException: Field "product.id" does not exist.{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36950) Normalize semi-structured data into a flat table.

2021-10-07 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-36950:
-
Issue Type: Improvement  (was: Wish)

> Normalize semi-structured data into a flat table.
> -
>
> Key: SPARK-36950
> URL: https://issues.apache.org/jira/browse/SPARK-36950
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Bjørn Jørgensen
>Priority: Major
>
> Hi, in pandas there is this json_normalize that flat out nested data.
> https://github.com/pandas-dev/pandas/blob/v1.3.3/pandas/io/json/_normalize.py#L112-L353
>  
> I have opened a request for this function at koalas. Now there are more 
> people that will have some function over to pyspark.
> https://github.com/databricks/koalas/issues/2162
> This is also a function that geopandas are using. In the meantime I have 
> found a gist that has code that flattens out the whole dataframe.
> https://gist.github.com/nmukerje/e65cde41be85470e4b8dfd9a2d6aed50 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29871) Flaky test: ImageFileFormatTest.test_read_images

2021-10-07 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-29871.
--
Fix Version/s: 3.3.0
 Assignee: Hyukjin Kwon
   Resolution: Fixed

> Flaky test: ImageFileFormatTest.test_read_images
> 
>
> Key: SPARK-29871
> URL: https://issues.apache.org/jira/browse/SPARK-29871
> Project: Spark
>  Issue Type: Test
>  Components: ML
>Affects Versions: 3.0.0, 3.1.2, 3.2.0
>Reporter: wuyi
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.3.0
>
>
> Running tests...
> --
>  test_read_images (pyspark.ml.tests.test_image.ImageFileFormatTest) ... ERROR 
> (12.050s)
> ==
> ERROR [12.050s]: test_read_images 
> (pyspark.ml.tests.test_image.ImageFileFormatTest)
> --
> Traceback (most recent call last):
>  File 
> "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/ml/tests/test_image.py",
>  line 35, in test_read_images
>  self.assertEqual(df.count(), 4)
>  File 
> "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/sql/dataframe.py",
>  line 507, in count
>  return int(self._jdf.count())
>  File 
> "/home/jenkins/workspace/SparkPullRequestBuilder/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py",
>  line 1286, in __call__
>  answer, self.gateway_client, self.target_id, self.name)
>  File 
> "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/sql/utils.py",
>  line 98, in deco
>  return f(*a, **kw)
>  File 
> "/home/jenkins/workspace/SparkPullRequestBuilder/python/lib/py4j-0.10.8.1-src.zip/py4j/protocol.py",
>  line 328, in get_return_value
>  format(target_id, ".", name), value)
> py4j.protocol.Py4JJavaError: An error occurred while calling o32.count.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 
> in stage 0.0 failed 1 times, most recent failure: Lost task 1.0 in stage 0.0 
> (TID 1, amp-jenkins-worker-05.amp, executor driver): 
> javax.imageio.IIOException: Unsupported Image Type
>  at 
> com.sun.imageio.plugins.jpeg.JPEGImageReader.readInternal(JPEGImageReader.java:1079)
>  at 
> com.sun.imageio.plugins.jpeg.JPEGImageReader.read(JPEGImageReader.java:1050)
>  at javax.imageio.ImageIO.read(ImageIO.java:1448)
>  at javax.imageio.ImageIO.read(ImageIO.java:1352)
>  at org.apache.spark.ml.image.ImageSchema$.decode(ImageSchema.scala:134)
>  at 
> org.apache.spark.ml.source.image.ImageFileFormat.$anonfun$buildReader$2(ImageFileFormat.scala:84)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:147)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:132)
>  at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:116)
>  at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:169)
>  at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:93)
>  at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
>  at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.agg_doAggregateWithoutKey_0$(generated.java:33)
>  at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(generated.java:63)
>  at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>  at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:726)
>  at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
>  at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:132)
>  at 
> org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
>  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
>  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
>  at org.apache.spark.scheduler.Task.run(Task.scala:127)
>  at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:462)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:465)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> Driver stacktrace:
>  at 
> org.apache.spark.scheduler.DAG

[jira] [Commented] (SPARK-29871) Flaky test: ImageFileFormatTest.test_read_images

2021-10-07 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425869#comment-17425869
 ] 

Hyukjin Kwon commented on SPARK-29871:
--

Fixed in https://github.com/apache/spark/pull/34187

> Flaky test: ImageFileFormatTest.test_read_images
> 
>
> Key: SPARK-29871
> URL: https://issues.apache.org/jira/browse/SPARK-29871
> Project: Spark
>  Issue Type: Test
>  Components: ML
>Affects Versions: 3.0.0, 3.1.2, 3.2.0
>Reporter: wuyi
>Priority: Major
>
> Running tests...
> --
>  test_read_images (pyspark.ml.tests.test_image.ImageFileFormatTest) ... ERROR 
> (12.050s)
> ==
> ERROR [12.050s]: test_read_images 
> (pyspark.ml.tests.test_image.ImageFileFormatTest)
> --
> Traceback (most recent call last):
>  File 
> "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/ml/tests/test_image.py",
>  line 35, in test_read_images
>  self.assertEqual(df.count(), 4)
>  File 
> "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/sql/dataframe.py",
>  line 507, in count
>  return int(self._jdf.count())
>  File 
> "/home/jenkins/workspace/SparkPullRequestBuilder/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py",
>  line 1286, in __call__
>  answer, self.gateway_client, self.target_id, self.name)
>  File 
> "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/sql/utils.py",
>  line 98, in deco
>  return f(*a, **kw)
>  File 
> "/home/jenkins/workspace/SparkPullRequestBuilder/python/lib/py4j-0.10.8.1-src.zip/py4j/protocol.py",
>  line 328, in get_return_value
>  format(target_id, ".", name), value)
> py4j.protocol.Py4JJavaError: An error occurred while calling o32.count.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 
> in stage 0.0 failed 1 times, most recent failure: Lost task 1.0 in stage 0.0 
> (TID 1, amp-jenkins-worker-05.amp, executor driver): 
> javax.imageio.IIOException: Unsupported Image Type
>  at 
> com.sun.imageio.plugins.jpeg.JPEGImageReader.readInternal(JPEGImageReader.java:1079)
>  at 
> com.sun.imageio.plugins.jpeg.JPEGImageReader.read(JPEGImageReader.java:1050)
>  at javax.imageio.ImageIO.read(ImageIO.java:1448)
>  at javax.imageio.ImageIO.read(ImageIO.java:1352)
>  at org.apache.spark.ml.image.ImageSchema$.decode(ImageSchema.scala:134)
>  at 
> org.apache.spark.ml.source.image.ImageFileFormat.$anonfun$buildReader$2(ImageFileFormat.scala:84)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:147)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:132)
>  at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:116)
>  at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:169)
>  at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:93)
>  at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
>  at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.agg_doAggregateWithoutKey_0$(generated.java:33)
>  at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(generated.java:63)
>  at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>  at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:726)
>  at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
>  at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:132)
>  at 
> org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
>  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
>  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
>  at org.apache.spark.scheduler.Task.run(Task.scala:127)
>  at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:462)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:465)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> Driver stacktrace:
>  at 
> org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAG

[jira] [Created] (SPARK-36951) Inline type hints for python/pyspark/sql/column.py

2021-10-07 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-36951:


 Summary: Inline type hints for python/pyspark/sql/column.py
 Key: SPARK-36951
 URL: https://issues.apache.org/jira/browse/SPARK-36951
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.3.0
Reporter: Xinrong Meng


Inline type hints for python/pyspark/sql/column.py for type check of function 
bodies.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36951) Inline type hints for python/pyspark/sql/column.py

2021-10-07 Thread Xinrong Meng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425794#comment-17425794
 ] 

Xinrong Meng commented on SPARK-36951:
--

I am working on this.

> Inline type hints for python/pyspark/sql/column.py
> --
>
> Key: SPARK-36951
> URL: https://issues.apache.org/jira/browse/SPARK-36951
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Inline type hints for python/pyspark/sql/column.py for type check of function 
> bodies.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36950) Normalize semi-structured data into a flat table.

2021-10-07 Thread Jira
Bjørn Jørgensen created SPARK-36950:
---

 Summary: Normalize semi-structured data into a flat table.
 Key: SPARK-36950
 URL: https://issues.apache.org/jira/browse/SPARK-36950
 Project: Spark
  Issue Type: Wish
  Components: PySpark
Affects Versions: 3.3.0
Reporter: Bjørn Jørgensen


Hi, in pandas there is this json_normalize that flat out nested data.

https://github.com/pandas-dev/pandas/blob/v1.3.3/pandas/io/json/_normalize.py#L112-L353
 

I have opened a request for this function at koalas. Now there are more people 
that will have some function over to pyspark.

https://github.com/databricks/koalas/issues/2162

This is also a function that geopandas are using. In the meantime I have found 
a gist that has code that flattens out the whole dataframe.

https://gist.github.com/nmukerje/e65cde41be85470e4b8dfd9a2d6aed50 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36936) spark-hadoop-cloud broken on release and only published via 3rd party repositories

2021-10-07 Thread Colin Williams (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425764#comment-17425764
 ] 

Colin Williams commented on SPARK-36936:


[~csun] when I see SPARK-35844 I see 3.2.0 version for the jar. That does not 
look to be published.



2021.10.07 12:39:03 INFO [warn] Note: Unresolved dependencies path:
2021.10.07 12:39:03 INFO [error] sbt.librarymanagement.ResolveException: Error 
downloading org.apache.spark:spark-hadoop-cloud_2.12:3.2.0
2021.10.07 12:39:03 INFO [error] Not found
2021.10.07 12:39:03 INFO [error] Not found
2021.10.07 12:39:03 INFO [error] not found: 
/home/colin/.ivy2/local/org.apache.spark/spark-hadoop-cloud_2.12/3.2.0/ivys/ivy.xml
2021.10.07 12:39:03 INFO [error] not found: 
https://repo1.maven.org/maven2/org/apache/spark/spark-hadoop-cloud_2.12/3.2.0/spark-hadoop-cloud_2.12-3.2.0.pom
2021.10.07 12:39:03 INFO [error] not found: 
https://repository.cloudera.com/artifactory/cloudera-repos/org/apache/spark/spark-hadoop-cloud_2.12/3.2.0/spark-hadoop-cloud_2.12-3.2.0.pom

> spark-hadoop-cloud broken on release and only published via 3rd party 
> repositories
> --
>
> Key: SPARK-36936
> URL: https://issues.apache.org/jira/browse/SPARK-36936
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 3.1.1, 3.1.2
> Environment: name:=spark-demo
> version := "0.0.1"
> scalaVersion := "2.12.12"
> lazy val app = (project in file("app")).settings(
>  assemblyPackageScala / assembleArtifact := false,
>  assembly / assemblyJarName := "uber.jar",
>  assembly / mainClass := Some("com.example.Main"),
>  // more settings here ...
>  )
> resolvers += "Cloudera" at 
> "https://repository.cloudera.com/artifactory/cloudera-repos/";
> libraryDependencies += "org.apache.spark" %% "spark-sql" % "3.1.2" % 
> "provided"
> libraryDependencies += "org.apache.spark" %% "spark-hadoop-cloud" % 
> "3.1.1.3.1.7270.0-253"
> libraryDependencies += "org.apache.hadoop" % "hadoop-aws" % 
> "3.1.1.7.2.7.0-184"
> libraryDependencies += "com.amazonaws" % "aws-java-sdk-bundle" % "1.11.901"
> libraryDependencies += "org.scalatest" %% "scalatest" % "3.0.1" % "test"
> // test suite settings
> fork in Test := true
> javaOptions ++= Seq("-Xms512M", "-Xmx2048M", "-XX:MaxPermSize=2048M", 
> "-XX:+CMSClassUnloadingEnabled")
> // Show runtime of tests
> testOptions in Test += Tests.Argument(TestFrameworks.ScalaTest, "-oD")
> ___
>  
> import org.apache.spark.sql.SparkSession
> object SparkApp {
>  def main(args: Array[String]){
>  val spark = SparkSession.builder().master("local")
>  //.config("spark.jars.repositories", 
> "https://repository.cloudera.com/artifactory/cloudera-repos/";)
>  //.config("spark.jars.packages", 
> "org.apache.spark:spark-hadoop-cloud_2.12:3.1.1.3.1.7270.0-253")
>  .appName("spark session").getOrCreate
>  val jsonDF = spark.read.json("s3a://path-to-bucket/compact.json")
>  val csvDF = spark.read.format("csv").load("s3a://path-to-bucket/some.csv")
>  jsonDF.show()
>  csvDF.show()
>  }
> }
>Reporter: Colin Williams
>Priority: Major
>
> The spark docmentation suggests using `spark-hadoop-cloud` to read / write 
> from S3 in [https://spark.apache.org/docs/latest/cloud-integration.html] . 
> However artifacts are currently published via only 3rd party resolvers in 
> [https://mvnrepository.com/artifact/org.apache.spark/spark-hadoop-cloud] 
> including Cloudera and Palantir.
>  
> Then apache spark documentation is providing a 3rd party solution for object 
> stores including S3. Furthermore, if you follow the instructions and include 
> one of the 3rd party jars IE the Cloudera jar with the spark 3.1.2 release 
> and try to access object store, the following exception is returned.
>  
> ```
> Exception in thread "main" java.lang.NoSuchMethodError: 'void 
> com.google.common.base.Preconditions.checkArgument(boolean, java.lang.String, 
> java.lang.Object, java.lang.Object)'
>  at org.apache.hadoop.fs.s3a.S3AUtils.lookupPassword(S3AUtils.java:894)
>  at org.apache.hadoop.fs.s3a.S3AUtils.lookupPassword(S3AUtils.java:870)
>  at 
> org.apache.hadoop.fs.s3a.S3AUtils.getEncryptionAlgorithm(S3AUtils.java:1605)
>  at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:363)
>  at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
>  at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
>  at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
>  at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
>  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
>  at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
>  at 
> org.apache

[jira] [Created] (SPARK-36949) Fix CREATE TABLE AS SELECT of ANSI intervals

2021-10-07 Thread Max Gekk (Jira)
Max Gekk created SPARK-36949:


 Summary: Fix CREATE TABLE AS SELECT of ANSI intervals
 Key: SPARK-36949
 URL: https://issues.apache.org/jira/browse/SPARK-36949
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.3.0
Reporter: Max Gekk


The given SQL should work:

{code:sql}
spark-sql> CREATE TABLE tbl1 STORED AS PARQUET AS SELECT INTERVAL '1-1' YEAR TO 
MONTH AS YM;
21/10/07 21:35:59 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, 
since hive.security.authorization.manager is set to instance of 
HiveAuthorizerFactory.
Error in query: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.IllegalArgumentException: Error: type expected at the position 0 of 
'interval year to month' but 'interval year to month' is found
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36940) Inline type hints for python/pyspark/sql/avro/functions.py

2021-10-07 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-36940.
---
Fix Version/s: 3.3.0
 Assignee: Xinrong Meng
   Resolution: Fixed

Issue resolved by pull request 34200
https://github.com/apache/spark/pull/34200

> Inline type hints for python/pyspark/sql/avro/functions.py
> --
>
> Key: SPARK-36940
> URL: https://issues.apache.org/jira/browse/SPARK-36940
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.3.0
>
>
> Inline type hints for python/pyspark/sql/avro/functions.py.
>  
> Currently, we use stub files for type annotations, which don't support type 
> checks within function bodies. So we inline type hints to support that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36942) Inline type hints for python/pyspark/sql/readwriter.py

2021-10-07 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425733#comment-17425733
 ] 

Apache Spark commented on SPARK-36942:
--

User 'xinrong-databricks' has created a pull request for this issue:
https://github.com/apache/spark/pull/34216

> Inline type hints for python/pyspark/sql/readwriter.py
> --
>
> Key: SPARK-36942
> URL: https://issues.apache.org/jira/browse/SPARK-36942
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Inline type hints for python/pyspark/sql/readwriter.py.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36942) Inline type hints for python/pyspark/sql/readwriter.py

2021-10-07 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425732#comment-17425732
 ] 

Apache Spark commented on SPARK-36942:
--

User 'xinrong-databricks' has created a pull request for this issue:
https://github.com/apache/spark/pull/34216

> Inline type hints for python/pyspark/sql/readwriter.py
> --
>
> Key: SPARK-36942
> URL: https://issues.apache.org/jira/browse/SPARK-36942
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Inline type hints for python/pyspark/sql/readwriter.py.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36942) Inline type hints for python/pyspark/sql/readwriter.py

2021-10-07 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36942:


Assignee: Apache Spark

> Inline type hints for python/pyspark/sql/readwriter.py
> --
>
> Key: SPARK-36942
> URL: https://issues.apache.org/jira/browse/SPARK-36942
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Xinrong Meng
>Assignee: Apache Spark
>Priority: Major
>
> Inline type hints for python/pyspark/sql/readwriter.py.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36942) Inline type hints for python/pyspark/sql/readwriter.py

2021-10-07 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36942:


Assignee: (was: Apache Spark)

> Inline type hints for python/pyspark/sql/readwriter.py
> --
>
> Key: SPARK-36942
> URL: https://issues.apache.org/jira/browse/SPARK-36942
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Inline type hints for python/pyspark/sql/readwriter.py.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36948) Check CREATE TABLE with ANSI intervals using Hive external catalog and Parquet

2021-10-07 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36948:


Assignee: Max Gekk  (was: Apache Spark)

> Check CREATE TABLE with ANSI intervals using Hive external catalog and Parquet
> --
>
> Key: SPARK-36948
> URL: https://issues.apache.org/jira/browse/SPARK-36948
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> Add a test which checks:
> 1. Creating a table with ANSI interval columns
> 2. INSERT INTO the table
> 3. Read inserted values back



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36948) Check CREATE TABLE with ANSI intervals using Hive external catalog and Parquet

2021-10-07 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425727#comment-17425727
 ] 

Apache Spark commented on SPARK-36948:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/34215

> Check CREATE TABLE with ANSI intervals using Hive external catalog and Parquet
> --
>
> Key: SPARK-36948
> URL: https://issues.apache.org/jira/browse/SPARK-36948
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
>
> Add a test which checks:
> 1. Creating a table with ANSI interval columns
> 2. INSERT INTO the table
> 3. Read inserted values back



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36948) Check CREATE TABLE with ANSI intervals using Hive external catalog and Parquet

2021-10-07 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36948:


Assignee: Apache Spark  (was: Max Gekk)

> Check CREATE TABLE with ANSI intervals using Hive external catalog and Parquet
> --
>
> Key: SPARK-36948
> URL: https://issues.apache.org/jira/browse/SPARK-36948
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
>
> Add a test which checks:
> 1. Creating a table with ANSI interval columns
> 2. INSERT INTO the table
> 3. Read inserted values back



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36948) Check CREATE TABLE with ANSI intervals using Hive external catalog and Parquet

2021-10-07 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425726#comment-17425726
 ] 

Apache Spark commented on SPARK-36948:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/34215

> Check CREATE TABLE with ANSI intervals using Hive external catalog and Parquet
> --
>
> Key: SPARK-36948
> URL: https://issues.apache.org/jira/browse/SPARK-36948
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> Add a test which checks:
> 1. Creating a table with ANSI interval columns
> 2. INSERT INTO the table
> 3. Read inserted values back



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36948) Check CREATE TABLE with ANSI intervals using Hive external catalog and Parquet

2021-10-07 Thread Max Gekk (Jira)
Max Gekk created SPARK-36948:


 Summary: Check CREATE TABLE with ANSI intervals using Hive 
external catalog and Parquet
 Key: SPARK-36948
 URL: https://issues.apache.org/jira/browse/SPARK-36948
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.3.0
Reporter: Max Gekk
Assignee: Max Gekk


Add a test which checks:
1. Creating a table with ANSI interval columns
2. INSERT INTO the table
3. Read inserted values back



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36947) Exception when trying to access Row field using getAs method

2021-10-07 Thread Alexandros Mavrommatis (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexandros Mavrommatis updated SPARK-36947:
---
Description: 
I have an input dataframe *df* with the following schema:
{code:java}
|-- origin: string (nullable = true)
|-- product: struct (nullable = true)
||-- id: integer (nullable = true){code}
 

when I try to select the first 20 rows of the id column I execute:
{code:java}
df.select("product.id").show(20, false)
{code}
 

and I manage to get the result. But when I execute the following: 
{code:java}
df.map(_.getAs[Int]("product.id")).show(20, false){code}
 

I get the following error:
{code:java}
java.lang.IllegalArgumentException: Field "product.id" does not exist.{code}
 

  was:
I have an input dataframe *df* with the following schema:

 
{code:java}
|-- origin: string (nullable = true)
|-- product: struct (nullable = true)
||-- id: integer (nullable = true){code}
 

when I try to select the first 20 rows of the id column I execute:

 
{code:java}
df.select("product.id").show(20, false)
{code}
**and I manage to get the result. But when I execute the following: 

 
{code:java}
df.map(_.getAs[Int]("product.id")).show(20, false){code}
 

I get the following error:

 
{code:java}
java.lang.IllegalArgumentException: Field "product.id" does not exist.{code}
 


> Exception when trying to access Row field using getAs method
> 
>
> Key: SPARK-36947
> URL: https://issues.apache.org/jira/browse/SPARK-36947
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
> Environment: Spark 3.1.2 (but this also may affect other versions as 
> well)
>Reporter: Alexandros Mavrommatis
>Priority: Blocker
>  Labels: catalyst, row, sql
>
> I have an input dataframe *df* with the following schema:
> {code:java}
> |-- origin: string (nullable = true)
> |-- product: struct (nullable = true)
> ||-- id: integer (nullable = true){code}
>  
> when I try to select the first 20 rows of the id column I execute:
> {code:java}
> df.select("product.id").show(20, false)
> {code}
>  
> and I manage to get the result. But when I execute the following: 
> {code:java}
> df.map(_.getAs[Int]("product.id")).show(20, false){code}
>  
> I get the following error:
> {code:java}
> java.lang.IllegalArgumentException: Field "product.id" does not exist.{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36947) Exception when trying to access Row field using getAs method

2021-10-07 Thread Alexandros Mavrommatis (Jira)
Alexandros Mavrommatis created SPARK-36947:
--

 Summary: Exception when trying to access Row field using getAs 
method
 Key: SPARK-36947
 URL: https://issues.apache.org/jira/browse/SPARK-36947
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.2
 Environment: Spark 3.1.2 (but this also may affect other versions as 
well)
Reporter: Alexandros Mavrommatis


I have an input dataframe *df* with the following schema:

 
{code:java}
|-- origin: string (nullable = true)
|-- product: struct (nullable = true)
||-- id: integer (nullable = true){code}
 

when I try to select the first 20 rows of the id column I execute:

 
{code:java}
df.select("product.id").show(20, false)
{code}
**and I manage to get the result. But when I execute the following: 

 
{code:java}
df.map(_.getAs[Int]("product.id")).show(20, false){code}
 

I get the following error:

 
{code:java}
java.lang.IllegalArgumentException: Field "product.id" does not exist.{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36900) "SPARK-36464: size returns correct positive number even with over 2GB data" will oom with JDK17

2021-10-07 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425570#comment-17425570
 ] 

Apache Spark commented on SPARK-36900:
--

User 'srowen' has created a pull request for this issue:
https://github.com/apache/spark/pull/34214

> "SPARK-36464: size returns correct positive number even with over 2GB data" 
> will oom with JDK17 
> 
>
> Key: SPARK-36900
> URL: https://issues.apache.org/jira/browse/SPARK-36900
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Minor
>
> Execute
>  
> {code:java}
> build/mvn clean install  -pl core -am -Dtest=none 
> -DwildcardSuites=org.apache.spark.util.io.ChunkedByteBufferOutputStreamSuite
> {code}
> with JDK 17,
> {code:java}
> ChunkedByteBufferOutputStreamSuite:
> - empty output
> - write a single byte
> - write a single near boundary
> - write a single at boundary
> - single chunk output
> - single chunk output at boundary size
> - multiple chunk output
> - multiple chunk output at boundary size
> *** RUN ABORTED ***
>   java.lang.OutOfMemoryError: Java heap space
>   at java.base/java.lang.Integer.valueOf(Integer.java:1081)
>   at scala.runtime.BoxesRunTime.boxToInteger(BoxesRunTime.java:67)
>   at 
> org.apache.spark.util.io.ChunkedByteBufferOutputStream.allocateNewChunkIfNeeded(ChunkedByteBufferOutputStream.scala:87)
>   at 
> org.apache.spark.util.io.ChunkedByteBufferOutputStream.write(ChunkedByteBufferOutputStream.scala:75)
>   at java.base/java.io.OutputStream.write(OutputStream.java:127)
>   at 
> org.apache.spark.util.io.ChunkedByteBufferOutputStreamSuite.$anonfun$new$22(ChunkedByteBufferOutputStreamSuite.scala:127)
>   at 
> org.apache.spark.util.io.ChunkedByteBufferOutputStreamSuite$$Lambda$179/0x0008011a75d8.apply(Unknown
>  Source)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36900) "SPARK-36464: size returns correct positive number even with over 2GB data" will oom with JDK17

2021-10-07 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425568#comment-17425568
 ] 

Apache Spark commented on SPARK-36900:
--

User 'srowen' has created a pull request for this issue:
https://github.com/apache/spark/pull/34214

> "SPARK-36464: size returns correct positive number even with over 2GB data" 
> will oom with JDK17 
> 
>
> Key: SPARK-36900
> URL: https://issues.apache.org/jira/browse/SPARK-36900
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Minor
>
> Execute
>  
> {code:java}
> build/mvn clean install  -pl core -am -Dtest=none 
> -DwildcardSuites=org.apache.spark.util.io.ChunkedByteBufferOutputStreamSuite
> {code}
> with JDK 17,
> {code:java}
> ChunkedByteBufferOutputStreamSuite:
> - empty output
> - write a single byte
> - write a single near boundary
> - write a single at boundary
> - single chunk output
> - single chunk output at boundary size
> - multiple chunk output
> - multiple chunk output at boundary size
> *** RUN ABORTED ***
>   java.lang.OutOfMemoryError: Java heap space
>   at java.base/java.lang.Integer.valueOf(Integer.java:1081)
>   at scala.runtime.BoxesRunTime.boxToInteger(BoxesRunTime.java:67)
>   at 
> org.apache.spark.util.io.ChunkedByteBufferOutputStream.allocateNewChunkIfNeeded(ChunkedByteBufferOutputStream.scala:87)
>   at 
> org.apache.spark.util.io.ChunkedByteBufferOutputStream.write(ChunkedByteBufferOutputStream.scala:75)
>   at java.base/java.io.OutputStream.write(OutputStream.java:127)
>   at 
> org.apache.spark.util.io.ChunkedByteBufferOutputStreamSuite.$anonfun$new$22(ChunkedByteBufferOutputStreamSuite.scala:127)
>   at 
> org.apache.spark.util.io.ChunkedByteBufferOutputStreamSuite$$Lambda$179/0x0008011a75d8.apply(Unknown
>  Source)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36900) "SPARK-36464: size returns correct positive number even with over 2GB data" will oom with JDK17

2021-10-07 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36900:


Assignee: Apache Spark

> "SPARK-36464: size returns correct positive number even with over 2GB data" 
> will oom with JDK17 
> 
>
> Key: SPARK-36900
> URL: https://issues.apache.org/jira/browse/SPARK-36900
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>
> Execute
>  
> {code:java}
> build/mvn clean install  -pl core -am -Dtest=none 
> -DwildcardSuites=org.apache.spark.util.io.ChunkedByteBufferOutputStreamSuite
> {code}
> with JDK 17,
> {code:java}
> ChunkedByteBufferOutputStreamSuite:
> - empty output
> - write a single byte
> - write a single near boundary
> - write a single at boundary
> - single chunk output
> - single chunk output at boundary size
> - multiple chunk output
> - multiple chunk output at boundary size
> *** RUN ABORTED ***
>   java.lang.OutOfMemoryError: Java heap space
>   at java.base/java.lang.Integer.valueOf(Integer.java:1081)
>   at scala.runtime.BoxesRunTime.boxToInteger(BoxesRunTime.java:67)
>   at 
> org.apache.spark.util.io.ChunkedByteBufferOutputStream.allocateNewChunkIfNeeded(ChunkedByteBufferOutputStream.scala:87)
>   at 
> org.apache.spark.util.io.ChunkedByteBufferOutputStream.write(ChunkedByteBufferOutputStream.scala:75)
>   at java.base/java.io.OutputStream.write(OutputStream.java:127)
>   at 
> org.apache.spark.util.io.ChunkedByteBufferOutputStreamSuite.$anonfun$new$22(ChunkedByteBufferOutputStreamSuite.scala:127)
>   at 
> org.apache.spark.util.io.ChunkedByteBufferOutputStreamSuite$$Lambda$179/0x0008011a75d8.apply(Unknown
>  Source)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36900) "SPARK-36464: size returns correct positive number even with over 2GB data" will oom with JDK17

2021-10-07 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36900:


Assignee: (was: Apache Spark)

> "SPARK-36464: size returns correct positive number even with over 2GB data" 
> will oom with JDK17 
> 
>
> Key: SPARK-36900
> URL: https://issues.apache.org/jira/browse/SPARK-36900
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Minor
>
> Execute
>  
> {code:java}
> build/mvn clean install  -pl core -am -Dtest=none 
> -DwildcardSuites=org.apache.spark.util.io.ChunkedByteBufferOutputStreamSuite
> {code}
> with JDK 17,
> {code:java}
> ChunkedByteBufferOutputStreamSuite:
> - empty output
> - write a single byte
> - write a single near boundary
> - write a single at boundary
> - single chunk output
> - single chunk output at boundary size
> - multiple chunk output
> - multiple chunk output at boundary size
> *** RUN ABORTED ***
>   java.lang.OutOfMemoryError: Java heap space
>   at java.base/java.lang.Integer.valueOf(Integer.java:1081)
>   at scala.runtime.BoxesRunTime.boxToInteger(BoxesRunTime.java:67)
>   at 
> org.apache.spark.util.io.ChunkedByteBufferOutputStream.allocateNewChunkIfNeeded(ChunkedByteBufferOutputStream.scala:87)
>   at 
> org.apache.spark.util.io.ChunkedByteBufferOutputStream.write(ChunkedByteBufferOutputStream.scala:75)
>   at java.base/java.io.OutputStream.write(OutputStream.java:127)
>   at 
> org.apache.spark.util.io.ChunkedByteBufferOutputStreamSuite.$anonfun$new$22(ChunkedByteBufferOutputStreamSuite.scala:127)
>   at 
> org.apache.spark.util.io.ChunkedByteBufferOutputStreamSuite$$Lambda$179/0x0008011a75d8.apply(Unknown
>  Source)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36900) "SPARK-36464: size returns correct positive number even with over 2GB data" will oom with JDK17

2021-10-07 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-36900:
-
Priority: Minor  (was: Major)

> "SPARK-36464: size returns correct positive number even with over 2GB data" 
> will oom with JDK17 
> 
>
> Key: SPARK-36900
> URL: https://issues.apache.org/jira/browse/SPARK-36900
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Minor
>
> Execute
>  
> {code:java}
> build/mvn clean install  -pl core -am -Dtest=none 
> -DwildcardSuites=org.apache.spark.util.io.ChunkedByteBufferOutputStreamSuite
> {code}
> with JDK 17,
> {code:java}
> ChunkedByteBufferOutputStreamSuite:
> - empty output
> - write a single byte
> - write a single near boundary
> - write a single at boundary
> - single chunk output
> - single chunk output at boundary size
> - multiple chunk output
> - multiple chunk output at boundary size
> *** RUN ABORTED ***
>   java.lang.OutOfMemoryError: Java heap space
>   at java.base/java.lang.Integer.valueOf(Integer.java:1081)
>   at scala.runtime.BoxesRunTime.boxToInteger(BoxesRunTime.java:67)
>   at 
> org.apache.spark.util.io.ChunkedByteBufferOutputStream.allocateNewChunkIfNeeded(ChunkedByteBufferOutputStream.scala:87)
>   at 
> org.apache.spark.util.io.ChunkedByteBufferOutputStream.write(ChunkedByteBufferOutputStream.scala:75)
>   at java.base/java.io.OutputStream.write(OutputStream.java:127)
>   at 
> org.apache.spark.util.io.ChunkedByteBufferOutputStreamSuite.$anonfun$new$22(ChunkedByteBufferOutputStreamSuite.scala:127)
>   at 
> org.apache.spark.util.io.ChunkedByteBufferOutputStreamSuite$$Lambda$179/0x0008011a75d8.apply(Unknown
>  Source)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36798) When SparkContext is stopped, metrics system should be flushed after listeners have finished processing

2021-10-07 Thread Mridul Muralidharan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan reassigned SPARK-36798:
---

Assignee: Harsh Panchal

> When SparkContext is stopped, metrics system should be flushed after 
> listeners have finished processing
> ---
>
> Key: SPARK-36798
> URL: https://issues.apache.org/jira/browse/SPARK-36798
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.3.2
>Reporter: Harsh Panchal
>Assignee: Harsh Panchal
>Priority: Minor
>
> In current implementation, when {{SparkContext.stop()}} is called, 
> {{metricsSystem.report()}} is called before {{listenerBus.stop()}}. In this 
> case, if some listener is producing some metrics, they would never reach sink.
> Background:
> We have some ingestion jobs in Spark Structured Streaming. To monitor them, 
> collect some metrics like number of input rows, trigger time etc. from 
> {{QueryProgressEvent}} received via {{StreamingQueryListener}}. These metrics 
> are then pushed to DB by custom sinks registered in {{MetricsSystem}}. We 
> noticed that these metrics are lost occasionally for last batch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36798) When SparkContext is stopped, metrics system should be flushed after listeners have finished processing

2021-10-07 Thread Mridul Muralidharan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan resolved SPARK-36798.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34039
[https://github.com/apache/spark/pull/34039]

> When SparkContext is stopped, metrics system should be flushed after 
> listeners have finished processing
> ---
>
> Key: SPARK-36798
> URL: https://issues.apache.org/jira/browse/SPARK-36798
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.3.2
>Reporter: Harsh Panchal
>Assignee: Harsh Panchal
>Priority: Minor
> Fix For: 3.3.0
>
>
> In current implementation, when {{SparkContext.stop()}} is called, 
> {{metricsSystem.report()}} is called before {{listenerBus.stop()}}. In this 
> case, if some listener is producing some metrics, they would never reach sink.
> Background:
> We have some ingestion jobs in Spark Structured Streaming. To monitor them, 
> collect some metrics like number of input rows, trigger time etc. from 
> {{QueryProgressEvent}} received via {{StreamingQueryListener}}. These metrics 
> are then pushed to DB by custom sinks registered in {{MetricsSystem}}. We 
> noticed that these metrics are lost occasionally for last batch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36396) Implement DataFrame.cov

2021-10-07 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425510#comment-17425510
 ] 

Apache Spark commented on SPARK-36396:
--

User 'dchvn' has created a pull request for this issue:
https://github.com/apache/spark/pull/34213

> Implement DataFrame.cov
> ---
>
> Key: SPARK-36396
> URL: https://issues.apache.org/jira/browse/SPARK-36396
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36402) Implement Series.combine

2021-10-07 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425493#comment-17425493
 ] 

Apache Spark commented on SPARK-36402:
--

User 'dchvn' has created a pull request for this issue:
https://github.com/apache/spark/pull/34212

> Implement Series.combine
> 
>
> Key: SPARK-36402
> URL: https://issues.apache.org/jira/browse/SPARK-36402
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36946) Support time for ps.to_datetime

2021-10-07 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425473#comment-17425473
 ] 

Apache Spark commented on SPARK-36946:
--

User 'dchvn' has created a pull request for this issue:
https://github.com/apache/spark/pull/34211

> Support time for ps.to_datetime
> ---
>
> Key: SPARK-36946
> URL: https://issues.apache.org/jira/browse/SPARK-36946
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dgd_contributor
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36946) Support time for ps.to_datetime

2021-10-07 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36946:


Assignee: Apache Spark

> Support time for ps.to_datetime
> ---
>
> Key: SPARK-36946
> URL: https://issues.apache.org/jira/browse/SPARK-36946
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dgd_contributor
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36946) Support time for ps.to_datetime

2021-10-07 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36946:


Assignee: (was: Apache Spark)

> Support time for ps.to_datetime
> ---
>
> Key: SPARK-36946
> URL: https://issues.apache.org/jira/browse/SPARK-36946
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dgd_contributor
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36946) Support time for ps.to_datetime

2021-10-07 Thread dgd_contributor (Jira)
dgd_contributor created SPARK-36946:
---

 Summary: Support time for ps.to_datetime
 Key: SPARK-36946
 URL: https://issues.apache.org/jira/browse/SPARK-36946
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.3.0
Reporter: dgd_contributor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36707) Support to specify index type and name in pandas API on Spark

2021-10-07 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-36707.
--
Fix Version/s: 3.3.0
 Assignee: Hyukjin Kwon
   Resolution: Done

> Support to specify index type and name in pandas API on Spark
> -
>
> Key: SPARK-36707
> URL: https://issues.apache.org/jira/browse/SPARK-36707
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.3.0
>
>
> See https://koalas.readthedocs.io/en/latest/user_guide/typehints.html.
> pandas API on Spark currently there's no way to specify the index type and 
> name in the output when you apply an arbitrary function, which forces to 
> create the default index:
> {code}
> >>> def transform(pdf) -> pd.DataFrame["id": int, "A": int]:
> ... pdf['A'] = pdf.id + 1
> ... return pdf
> ...
> >>> ps.range(5).koalas.apply_batch(transform)
> {code}
> {code}
>id   A
> 0   0   1
> 1   1   2
> 2   2   3
> 3   3   4
> 4   4   5
> {code}
> We should have a way to specify the index.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36713) Document new syntax for specifying index type

2021-10-07 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-36713.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34210
[https://github.com/apache/spark/pull/34210]

> Document new syntax for specifying index type
> -
>
> Key: SPARK-36713
> URL: https://issues.apache.org/jira/browse/SPARK-36713
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36713) Document new syntax for specifying index type

2021-10-07 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-36713:


Assignee: Hyukjin Kwon

> Document new syntax for specifying index type
> -
>
> Key: SPARK-36713
> URL: https://issues.apache.org/jira/browse/SPARK-36713
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org