from:"Apache Spark \(Jira\)"

[jira] [Assigned] (SPARK-42722) Python Connect def schema() should not cache the schema

2023-03-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42722:


Assignee: Rui Wang  (was: Apache Spark)

> Python Connect def schema() should not cache the schema 
> 
>
> Key: SPARK-42722
> URL: https://issues.apache.org/jira/browse/SPARK-42722
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42721) Add an Interceptor to log RPCs in connect-server

2023-03-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17698057#comment-17698057
 ] 

Apache Spark commented on SPARK-42721:
--

User 'rangadi' has created a pull request for this issue:
https://github.com/apache/spark/pull/40342

> Add an Interceptor to log RPCs in connect-server
> 
>
> Key: SPARK-42721
> URL: https://issues.apache.org/jira/browse/SPARK-42721
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Raghu Angadi
>Priority: Major
> Fix For: 3.5.0
>
>
> It would be useful to be able to log RPC to connect server during 
> development. It makes simpler to see the flow of messages. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42721) Add an Interceptor to log RPCs in connect-server

2023-03-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42721:


Assignee: (was: Apache Spark)

> Add an Interceptor to log RPCs in connect-server
> 
>
> Key: SPARK-42721
> URL: https://issues.apache.org/jira/browse/SPARK-42721
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Raghu Angadi
>Priority: Major
> Fix For: 3.5.0
>
>
> It would be useful to be able to log RPC to connect server during 
> development. It makes simpler to see the flow of messages. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42721) Add an Interceptor to log RPCs in connect-server

2023-03-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42721:


Assignee: Apache Spark

> Add an Interceptor to log RPCs in connect-server
> 
>
> Key: SPARK-42721
> URL: https://issues.apache.org/jira/browse/SPARK-42721
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Raghu Angadi
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.5.0
>
>
> It would be useful to be able to log RPC to connect server during 
> development. It makes simpler to see the flow of messages. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42721) Add an Interceptor to log RPCs in connect-server

2023-03-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17698055#comment-17698055
 ] 

Apache Spark commented on SPARK-42721:
--

User 'rangadi' has created a pull request for this issue:
https://github.com/apache/spark/pull/40342

> Add an Interceptor to log RPCs in connect-server
> 
>
> Key: SPARK-42721
> URL: https://issues.apache.org/jira/browse/SPARK-42721
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Raghu Angadi
>Priority: Major
> Fix For: 3.5.0
>
>
> It would be useful to be able to log RPC to connect server during 
> development. It makes simpler to see the flow of messages. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42715) NegativeArraySizeException by too many datas read from ORC file

2023-03-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697999#comment-17697999
 ] 

Apache Spark commented on SPARK-42715:
--

User 'chong0929' has created a pull request for this issue:
https://github.com/apache/spark/pull/40341

> NegativeArraySizeException by too many datas read from ORC file
> ---
>
> Key: SPARK-42715
> URL: https://issues.apache.org/jira/browse/SPARK-42715
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.2
>Reporter: XiaoLong Wu
>Priority: Minor
>
> If need more friendly exception msg about how to avoid this exception? Like 
> when we catch this expetion, told user can reduce the value about 
> spark.sql.orc.columnarReaderBatchSize;
> In the current version, for batch reading of orc files, we use the function 
> OrcColumnarBatchReader.nextBatch() to do this and depends on 
> [ORC|https://github.com/apache/orc](version:1.8.2) to completed data copy, in 
> ORC relevant code is as follows:
> {code:java}
> private static byte[] commonReadByteArrays(InStream stream, IntegerReader 
> lengths,
> LongColumnVector scratchlcv,
> BytesColumnVector result, final int batchSize) throws IOException {
>   // Read lengths
>   scratchlcv.isRepeating = result.isRepeating;
>   scratchlcv.noNulls = result.noNulls;
>   scratchlcv.isNull = result.isNull;  // Notice we are replacing the isNull 
> vector here...
>   lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize);
>   int totalLength = 0;
>   if (!scratchlcv.isRepeating) {
> for (int i = 0; i < batchSize; i++) {
>   if (!scratchlcv.isNull[i]) {
> totalLength += (int) scratchlcv.vector[i];
>   }
> }
>   } else {
> if (!scratchlcv.isNull[0]) {
>   totalLength = (int) (batchSize * scratchlcv.vector[0]);
> }
>   }
>   // Read all the strings for this batch
>   byte[] allBytes = new byte[totalLength];
>   int offset = 0;
>   int len = totalLength;
>   while (len > 0) {
> int bytesRead = stream.read(allBytes, offset, len);
> if (bytesRead < 0) {
>   throw new EOFException("Can't finish byte read from " + stream);
> }
> len -= bytesRead;
> offset += bytesRead;
>   }
>   return allBytes;
> } {code}
>  As shown above, totalLength as a Long type param is used to mark the data 
> size. If the data size too big to over max_int, converting to int will lead 
> to value overflow and throws the following exception:
> {code:java}
> Caused by: java.lang.NegativeArraySizeException
>     at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1998)
>     at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:2021)
>     at 
> org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:2119)
>     at 
> org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1962)
>     at 
> org.apache.orc.impl.reader.tree.StructBatchReader.readBatchColumn(StructBatchReader.java:65)
>     at 
> org.apache.orc.impl.reader.tree.StructBatchReader.nextBatchForLevel(StructBatchReader.java:100)
>     at 
> org.apache.orc.impl.reader.tree.StructBatchReader.nextBatch(StructBatchReader.java:77)
>     at 
> org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1371)
>     at 
> org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextBatch(OrcColumnarBatchReader.java:197)
>     at 
> org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextKeyValue(OrcColumnarBatchReader.java:99)
>     at 
> org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
>     at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:116)
>     at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:274)
>     ... 20 more {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42715) NegativeArraySizeException by too many datas read from ORC file

2023-03-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42715:


Assignee: Apache Spark

> NegativeArraySizeException by too many datas read from ORC file
> ---
>
> Key: SPARK-42715
> URL: https://issues.apache.org/jira/browse/SPARK-42715
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.2
>Reporter: XiaoLong Wu
>Assignee: Apache Spark
>Priority: Minor
>
> If need more friendly exception msg about how to avoid this exception? Like 
> when we catch this expetion, told user can reduce the value about 
> spark.sql.orc.columnarReaderBatchSize;
> In the current version, for batch reading of orc files, we use the function 
> OrcColumnarBatchReader.nextBatch() to do this and depends on 
> [ORC|https://github.com/apache/orc](version:1.8.2) to completed data copy, in 
> ORC relevant code is as follows:
> {code:java}
> private static byte[] commonReadByteArrays(InStream stream, IntegerReader 
> lengths,
> LongColumnVector scratchlcv,
> BytesColumnVector result, final int batchSize) throws IOException {
>   // Read lengths
>   scratchlcv.isRepeating = result.isRepeating;
>   scratchlcv.noNulls = result.noNulls;
>   scratchlcv.isNull = result.isNull;  // Notice we are replacing the isNull 
> vector here...
>   lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize);
>   int totalLength = 0;
>   if (!scratchlcv.isRepeating) {
> for (int i = 0; i < batchSize; i++) {
>   if (!scratchlcv.isNull[i]) {
> totalLength += (int) scratchlcv.vector[i];
>   }
> }
>   } else {
> if (!scratchlcv.isNull[0]) {
>   totalLength = (int) (batchSize * scratchlcv.vector[0]);
> }
>   }
>   // Read all the strings for this batch
>   byte[] allBytes = new byte[totalLength];
>   int offset = 0;
>   int len = totalLength;
>   while (len > 0) {
> int bytesRead = stream.read(allBytes, offset, len);
> if (bytesRead < 0) {
>   throw new EOFException("Can't finish byte read from " + stream);
> }
> len -= bytesRead;
> offset += bytesRead;
>   }
>   return allBytes;
> } {code}
>  As shown above, totalLength as a Long type param is used to mark the data 
> size. If the data size too big to over max_int, converting to int will lead 
> to value overflow and throws the following exception:
> {code:java}
> Caused by: java.lang.NegativeArraySizeException
>     at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1998)
>     at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:2021)
>     at 
> org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:2119)
>     at 
> org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1962)
>     at 
> org.apache.orc.impl.reader.tree.StructBatchReader.readBatchColumn(StructBatchReader.java:65)
>     at 
> org.apache.orc.impl.reader.tree.StructBatchReader.nextBatchForLevel(StructBatchReader.java:100)
>     at 
> org.apache.orc.impl.reader.tree.StructBatchReader.nextBatch(StructBatchReader.java:77)
>     at 
> org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1371)
>     at 
> org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextBatch(OrcColumnarBatchReader.java:197)
>     at 
> org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextKeyValue(OrcColumnarBatchReader.java:99)
>     at 
> org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
>     at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:116)
>     at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:274)
>     ... 20 more {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42715) NegativeArraySizeException by too many datas read from ORC file

2023-03-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42715:


Assignee: (was: Apache Spark)

> NegativeArraySizeException by too many datas read from ORC file
> ---
>
> Key: SPARK-42715
> URL: https://issues.apache.org/jira/browse/SPARK-42715
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.2
>Reporter: XiaoLong Wu
>Priority: Minor
>
> If need more friendly exception msg about how to avoid this exception? Like 
> when we catch this expetion, told user can reduce the value about 
> spark.sql.orc.columnarReaderBatchSize;
> In the current version, for batch reading of orc files, we use the function 
> OrcColumnarBatchReader.nextBatch() to do this and depends on 
> [ORC|https://github.com/apache/orc](version:1.8.2) to completed data copy, in 
> ORC relevant code is as follows:
> {code:java}
> private static byte[] commonReadByteArrays(InStream stream, IntegerReader 
> lengths,
> LongColumnVector scratchlcv,
> BytesColumnVector result, final int batchSize) throws IOException {
>   // Read lengths
>   scratchlcv.isRepeating = result.isRepeating;
>   scratchlcv.noNulls = result.noNulls;
>   scratchlcv.isNull = result.isNull;  // Notice we are replacing the isNull 
> vector here...
>   lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize);
>   int totalLength = 0;
>   if (!scratchlcv.isRepeating) {
> for (int i = 0; i < batchSize; i++) {
>   if (!scratchlcv.isNull[i]) {
> totalLength += (int) scratchlcv.vector[i];
>   }
> }
>   } else {
> if (!scratchlcv.isNull[0]) {
>   totalLength = (int) (batchSize * scratchlcv.vector[0]);
> }
>   }
>   // Read all the strings for this batch
>   byte[] allBytes = new byte[totalLength];
>   int offset = 0;
>   int len = totalLength;
>   while (len > 0) {
> int bytesRead = stream.read(allBytes, offset, len);
> if (bytesRead < 0) {
>   throw new EOFException("Can't finish byte read from " + stream);
> }
> len -= bytesRead;
> offset += bytesRead;
>   }
>   return allBytes;
> } {code}
>  As shown above, totalLength as a Long type param is used to mark the data 
> size. If the data size too big to over max_int, converting to int will lead 
> to value overflow and throws the following exception:
> {code:java}
> Caused by: java.lang.NegativeArraySizeException
>     at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1998)
>     at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:2021)
>     at 
> org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:2119)
>     at 
> org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1962)
>     at 
> org.apache.orc.impl.reader.tree.StructBatchReader.readBatchColumn(StructBatchReader.java:65)
>     at 
> org.apache.orc.impl.reader.tree.StructBatchReader.nextBatchForLevel(StructBatchReader.java:100)
>     at 
> org.apache.orc.impl.reader.tree.StructBatchReader.nextBatch(StructBatchReader.java:77)
>     at 
> org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1371)
>     at 
> org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextBatch(OrcColumnarBatchReader.java:197)
>     at 
> org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextKeyValue(OrcColumnarBatchReader.java:99)
>     at 
> org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
>     at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:116)
>     at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:274)
>     ... 20 more {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42701) Add the try_aes_decrypt() function

2023-03-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42701:


Assignee: Max Gekk  (was: Apache Spark)

> Add the try_aes_decrypt() function
> --
>
> Key: SPARK-42701
> URL: https://issues.apache.org/jira/browse/SPARK-42701
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>  Labels: starter
>
> Add new function try_aes_decrypt(). The function aes_decrypt() fails w/ an 
> exception when it faces to a column value which it cannot decrypt. So, if a 
> column contains bad and good input, it is impossible to decrypt even good 
> input.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42701) Add the try_aes_decrypt() function

2023-03-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42701:


Assignee: Apache Spark  (was: Max Gekk)

> Add the try_aes_decrypt() function
> --
>
> Key: SPARK-42701
> URL: https://issues.apache.org/jira/browse/SPARK-42701
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
>  Labels: starter
>
> Add new function try_aes_decrypt(). The function aes_decrypt() fails w/ an 
> exception when it faces to a column value which it cannot decrypt. So, if a 
> column contains bad and good input, it is impossible to decrypt even good 
> input.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42701) Add the try_aes_decrypt() function

2023-03-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697923#comment-17697923
 ] 

Apache Spark commented on SPARK-42701:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/40340

> Add the try_aes_decrypt() function
> --
>
> Key: SPARK-42701
> URL: https://issues.apache.org/jira/browse/SPARK-42701
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>  Labels: starter
>
> Add new function try_aes_decrypt(). The function aes_decrypt() fails w/ an 
> exception when it faces to a column value which it cannot decrypt. So, if a 
> column contains bad and good input, it is impossible to decrypt even good 
> input.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42719) `MapOutputTracker#getMapLocation` should respect `spark.shuffle.reduceLocality.enabled`

2023-03-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42719:


Assignee: (was: Apache Spark)

> `MapOutputTracker#getMapLocation` should respect  
> `spark.shuffle.reduceLocality.enabled`
> 
>
> Key: SPARK-42719
> URL: https://issues.apache.org/jira/browse/SPARK-42719
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: He Qi
>Priority: Major
>
> Discuss as [https://github.com/apache/spark/pull/40307]
> {{getPreferredLocations}} in {{ShuffledRowRDD}} should return {{Nil}} at the 
> very beginning in case {{spark.shuffle.reduceLocality.enabled = false}} 
> (conceptually).
> This logic is pushed into MapOutputTracker though - and 
> {{getPreferredLocationsForShuffle}} honors 
> {{spark.shuffle.reduceLocality.enabled}} - but {{getMapLocation}} does not.
> So the fix would be to fix {{getMapLocation}} to honor the parameter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42719) `MapOutputTracker#getMapLocation` should respect `spark.shuffle.reduceLocality.enabled`

2023-03-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697891#comment-17697891
 ] 

Apache Spark commented on SPARK-42719:
--

User 'jerqi' has created a pull request for this issue:
https://github.com/apache/spark/pull/40339

> `MapOutputTracker#getMapLocation` should respect  
> `spark.shuffle.reduceLocality.enabled`
> 
>
> Key: SPARK-42719
> URL: https://issues.apache.org/jira/browse/SPARK-42719
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: He Qi
>Priority: Major
>
> Discuss as [https://github.com/apache/spark/pull/40307]
> {{getPreferredLocations}} in {{ShuffledRowRDD}} should return {{Nil}} at the 
> very beginning in case {{spark.shuffle.reduceLocality.enabled = false}} 
> (conceptually).
> This logic is pushed into MapOutputTracker though - and 
> {{getPreferredLocationsForShuffle}} honors 
> {{spark.shuffle.reduceLocality.enabled}} - but {{getMapLocation}} does not.
> So the fix would be to fix {{getMapLocation}} to honor the parameter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42719) `MapOutputTracker#getMapLocation` should respect `spark.shuffle.reduceLocality.enabled`

2023-03-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42719:


Assignee: Apache Spark

> `MapOutputTracker#getMapLocation` should respect  
> `spark.shuffle.reduceLocality.enabled`
> 
>
> Key: SPARK-42719
> URL: https://issues.apache.org/jira/browse/SPARK-42719
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: He Qi
>Assignee: Apache Spark
>Priority: Major
>
> Discuss as [https://github.com/apache/spark/pull/40307]
> {{getPreferredLocations}} in {{ShuffledRowRDD}} should return {{Nil}} at the 
> very beginning in case {{spark.shuffle.reduceLocality.enabled = false}} 
> (conceptually).
> This logic is pushed into MapOutputTracker though - and 
> {{getPreferredLocationsForShuffle}} honors 
> {{spark.shuffle.reduceLocality.enabled}} - but {{getMapLocation}} does not.
> So the fix would be to fix {{getMapLocation}} to honor the parameter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42718) Upgrade rocksdbjni to 7.10.2

2023-03-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42718:


Assignee: Apache Spark

> Upgrade rocksdbjni to 7.10.2
> 
>
> Key: SPARK-42718
> URL: https://issues.apache.org/jira/browse/SPARK-42718
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>
> https://github.com/facebook/rocksdb/releases/tag/v7.10.2



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42718) Upgrade rocksdbjni to 7.10.2

2023-03-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42718:


Assignee: (was: Apache Spark)

> Upgrade rocksdbjni to 7.10.2
> 
>
> Key: SPARK-42718
> URL: https://issues.apache.org/jira/browse/SPARK-42718
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Major
>
> https://github.com/facebook/rocksdb/releases/tag/v7.10.2



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42718) Upgrade rocksdbjni to 7.10.2

2023-03-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697875#comment-17697875
 ] 

Apache Spark commented on SPARK-42718:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40337

> Upgrade rocksdbjni to 7.10.2
> 
>
> Key: SPARK-42718
> URL: https://issues.apache.org/jira/browse/SPARK-42718
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Major
>
> https://github.com/facebook/rocksdb/releases/tag/v7.10.2



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42706) List the error class to user-facing documentation.

2023-03-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42706:


Assignee: (was: Apache Spark)

> List the error class to user-facing documentation.
> --
>
> Key: SPARK-42706
> URL: https://issues.apache.org/jira/browse/SPARK-42706
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We need to have an error class list to user facing documents.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42706) List the error class to user-facing documentation.

2023-03-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42706:


Assignee: Apache Spark

> List the error class to user-facing documentation.
> --
>
> Key: SPARK-42706
> URL: https://issues.apache.org/jira/browse/SPARK-42706
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>
> We need to have an error class list to user facing documents.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42706) List the error class to user-facing documentation.

2023-03-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697863#comment-17697863
 ] 

Apache Spark commented on SPARK-42706:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/40336

> List the error class to user-facing documentation.
> --
>
> Key: SPARK-42706
> URL: https://issues.apache.org/jira/browse/SPARK-42706
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We need to have an error class list to user facing documents.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42717) Upgrade mysql-connector-java from 8.0.31 to 8.0.32

2023-03-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697850#comment-17697850
 ] 

Apache Spark commented on SPARK-42717:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40335

> Upgrade mysql-connector-java from 8.0.31 to 8.0.32
> --
>
> Key: SPARK-42717
> URL: https://issues.apache.org/jira/browse/SPARK-42717
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42717) Upgrade mysql-connector-java from 8.0.31 to 8.0.32

2023-03-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697851#comment-17697851
 ] 

Apache Spark commented on SPARK-42717:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40335

> Upgrade mysql-connector-java from 8.0.31 to 8.0.32
> --
>
> Key: SPARK-42717
> URL: https://issues.apache.org/jira/browse/SPARK-42717
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42717) Upgrade mysql-connector-java from 8.0.31 to 8.0.32

2023-03-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42717:


Assignee: Apache Spark

> Upgrade mysql-connector-java from 8.0.31 to 8.0.32
> --
>
> Key: SPARK-42717
> URL: https://issues.apache.org/jira/browse/SPARK-42717
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42717) Upgrade mysql-connector-java from 8.0.31 to 8.0.32

2023-03-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42717:


Assignee: (was: Apache Spark)

> Upgrade mysql-connector-java from 8.0.31 to 8.0.32
> --
>
> Key: SPARK-42717
> URL: https://issues.apache.org/jira/browse/SPARK-42717
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42716) DataSourceV2 cannot report KeyGroupedPartitioning with multiple keys per partition

2023-03-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697847#comment-17697847
 ] 

Apache Spark commented on SPARK-42716:
--

User 'EnricoMi' has created a pull request for this issue:
https://github.com/apache/spark/pull/40334

> DataSourceV2 cannot report KeyGroupedPartitioning with multiple keys per 
> partition
> --
>
> Key: SPARK-42716
> URL: https://issues.apache.org/jira/browse/SPARK-42716
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0, 3.3.1, 3.3.2, 3.4.0, 3.4.1
>Reporter: Enrico Minack
>Priority: Major
>
> From Spark 3.0.0 until 3.2.3, a DataSourceV2 could report its partitioning as 
> {{KeyGroupedPartitioning}} via {{SupportsReportPartitioning}}, even if 
> multiple keys belong to a partition.
> With SPARK-37377, only if all partitions implement {{HasPartitionKey}}, the 
> partition information reported through {{SupportsReportPartitioning}} is 
> considered by catalyst. But this limits the number of keys per partition to 1.
> Spark should continue to support the more general situation of 
> {{KeyGroupedPartitioning}} with multiple keys per partition, like 
> {{HashPartitioning}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42716) DataSourceV2 cannot report KeyGroupedPartitioning with multiple keys per partition

2023-03-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42716:


Assignee: (was: Apache Spark)

> DataSourceV2 cannot report KeyGroupedPartitioning with multiple keys per 
> partition
> --
>
> Key: SPARK-42716
> URL: https://issues.apache.org/jira/browse/SPARK-42716
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0, 3.3.1, 3.3.2, 3.4.0, 3.4.1
>Reporter: Enrico Minack
>Priority: Major
>
> From Spark 3.0.0 until 3.2.3, a DataSourceV2 could report its partitioning as 
> {{KeyGroupedPartitioning}} via {{SupportsReportPartitioning}}, even if 
> multiple keys belong to a partition.
> With SPARK-37377, only if all partitions implement {{HasPartitionKey}}, the 
> partition information reported through {{SupportsReportPartitioning}} is 
> considered by catalyst. But this limits the number of keys per partition to 1.
> Spark should continue to support the more general situation of 
> {{KeyGroupedPartitioning}} with multiple keys per partition, like 
> {{HashPartitioning}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42716) DataSourceV2 cannot report KeyGroupedPartitioning with multiple keys per partition

2023-03-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42716:


Assignee: Apache Spark

> DataSourceV2 cannot report KeyGroupedPartitioning with multiple keys per 
> partition
> --
>
> Key: SPARK-42716
> URL: https://issues.apache.org/jira/browse/SPARK-42716
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0, 3.3.1, 3.3.2, 3.4.0, 3.4.1
>Reporter: Enrico Minack
>Assignee: Apache Spark
>Priority: Major
>
> From Spark 3.0.0 until 3.2.3, a DataSourceV2 could report its partitioning as 
> {{KeyGroupedPartitioning}} via {{SupportsReportPartitioning}}, even if 
> multiple keys belong to a partition.
> With SPARK-37377, only if all partitions implement {{HasPartitionKey}}, the 
> partition information reported through {{SupportsReportPartitioning}} is 
> considered by catalyst. But this limits the number of keys per partition to 1.
> Spark should continue to support the more general situation of 
> {{KeyGroupedPartitioning}} with multiple keys per partition, like 
> {{HashPartitioning}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42716) DataSourceV2 cannot report KeyGroupedPartitioning with multiple keys per partition

2023-03-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697845#comment-17697845
 ] 

Apache Spark commented on SPARK-42716:
--

User 'EnricoMi' has created a pull request for this issue:
https://github.com/apache/spark/pull/40334

> DataSourceV2 cannot report KeyGroupedPartitioning with multiple keys per 
> partition
> --
>
> Key: SPARK-42716
> URL: https://issues.apache.org/jira/browse/SPARK-42716
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0, 3.3.1, 3.3.2, 3.4.0, 3.4.1
>Reporter: Enrico Minack
>Priority: Major
>
> From Spark 3.0.0 until 3.2.3, a DataSourceV2 could report its partitioning as 
> {{KeyGroupedPartitioning}} via {{SupportsReportPartitioning}}, even if 
> multiple keys belong to a partition.
> With SPARK-37377, only if all partitions implement {{HasPartitionKey}}, the 
> partition information reported through {{SupportsReportPartitioning}} is 
> considered by catalyst. But this limits the number of keys per partition to 1.
> Spark should continue to support the more general situation of 
> {{KeyGroupedPartitioning}} with multiple keys per partition, like 
> {{HashPartitioning}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42623) parameter markers not blocked in DDL

2023-03-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42623:


Assignee: Apache Spark

> parameter markers not blocked in DDL
> 
>
> Key: SPARK-42623
> URL: https://issues.apache.org/jira/browse/SPARK-42623
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Serge Rielau
>Assignee: Apache Spark
>Priority: Major
>
> The parameterized query code does not block DDL statements from referencing 
> parameter markers.
> E.g. a 
>  
> {code:java}
> scala> spark.sql(sqlText = "CREATE VIEW v1 AS SELECT current_timestamp() + 
> :later as stamp, :x * :x AS square", args = Map("later" -> "INTERVAL'3' 
> HOUR", "x" -> "15.0")).show()
> ++
> ||
> ++
> ++
> {code}
> It appears we have some protection that fails us when the view is invoked:
>  
> {code:java}
> scala> spark.sql(sqlText = "SELECT * FROM v1", args = Map("later" -> 
> "INTERVAL'3' HOUR", "x" -> "15.0")).show()
> org.apache.spark.sql.AnalysisException: [UNBOUND_SQL_PARAMETER] Found the 
> unbound parameter: `later`. Please, fix `args` and provide a mapping of the 
> parameter to a SQL literal.; line 1 pos 29
> {code}
> Right now I think affected are:
> * DEFAULT definition
> * VIEW definition
> but any other future standard expression popping up is at risk, such as SQL 
> Functions, or GENERATED COLUMN.
> CREATE TABLE AS is debatable, since it it executes the query at definition 
> only.
> For simplicity I propose to block the feature from ANY DDL statement (CREATE, 
> ALTER).
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42623) parameter markers not blocked in DDL

2023-03-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42623:


Assignee: (was: Apache Spark)

> parameter markers not blocked in DDL
> 
>
> Key: SPARK-42623
> URL: https://issues.apache.org/jira/browse/SPARK-42623
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Serge Rielau
>Priority: Major
>
> The parameterized query code does not block DDL statements from referencing 
> parameter markers.
> E.g. a 
>  
> {code:java}
> scala> spark.sql(sqlText = "CREATE VIEW v1 AS SELECT current_timestamp() + 
> :later as stamp, :x * :x AS square", args = Map("later" -> "INTERVAL'3' 
> HOUR", "x" -> "15.0")).show()
> ++
> ||
> ++
> ++
> {code}
> It appears we have some protection that fails us when the view is invoked:
>  
> {code:java}
> scala> spark.sql(sqlText = "SELECT * FROM v1", args = Map("later" -> 
> "INTERVAL'3' HOUR", "x" -> "15.0")).show()
> org.apache.spark.sql.AnalysisException: [UNBOUND_SQL_PARAMETER] Found the 
> unbound parameter: `later`. Please, fix `args` and provide a mapping of the 
> parameter to a SQL literal.; line 1 pos 29
> {code}
> Right now I think affected are:
> * DEFAULT definition
> * VIEW definition
> but any other future standard expression popping up is at risk, such as SQL 
> Functions, or GENERATED COLUMN.
> CREATE TABLE AS is debatable, since it it executes the query at definition 
> only.
> For simplicity I propose to block the feature from ANY DDL statement (CREATE, 
> ALTER).
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42623) parameter markers not blocked in DDL

2023-03-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697796#comment-17697796
 ] 

Apache Spark commented on SPARK-42623:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/40333

> parameter markers not blocked in DDL
> 
>
> Key: SPARK-42623
> URL: https://issues.apache.org/jira/browse/SPARK-42623
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Serge Rielau
>Priority: Major
>
> The parameterized query code does not block DDL statements from referencing 
> parameter markers.
> E.g. a 
>  
> {code:java}
> scala> spark.sql(sqlText = "CREATE VIEW v1 AS SELECT current_timestamp() + 
> :later as stamp, :x * :x AS square", args = Map("later" -> "INTERVAL'3' 
> HOUR", "x" -> "15.0")).show()
> ++
> ||
> ++
> ++
> {code}
> It appears we have some protection that fails us when the view is invoked:
>  
> {code:java}
> scala> spark.sql(sqlText = "SELECT * FROM v1", args = Map("later" -> 
> "INTERVAL'3' HOUR", "x" -> "15.0")).show()
> org.apache.spark.sql.AnalysisException: [UNBOUND_SQL_PARAMETER] Found the 
> unbound parameter: `later`. Please, fix `args` and provide a mapping of the 
> parameter to a SQL literal.; line 1 pos 29
> {code}
> Right now I think affected are:
> * DEFAULT definition
> * VIEW definition
> but any other future standard expression popping up is at risk, such as SQL 
> Functions, or GENERATED COLUMN.
> CREATE TABLE AS is debatable, since it it executes the query at definition 
> only.
> For simplicity I propose to block the feature from ANY DDL statement (CREATE, 
> ALTER).
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42702) Support parameterized CTE

2023-03-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42702:


Assignee: Max Gekk  (was: Apache Spark)

> Support parameterized CTE
> -
>
> Key: SPARK-42702
> URL: https://issues.apache.org/jira/browse/SPARK-42702
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> Support named parameters in named common table expressions (CTE). At the 
> moment, such queries failed:
> {code:java}
> CREATE TABLE tbl(namespace STRING) USING parquet
> INSERT INTO tbl SELECT 'abc'
> WITH transitions AS (
>   SELECT * FROM tbl WHERE namespace = :namespace
> ) SELECT * FROM transitions {code}
> w/ the following error:
> {code:java}
> [UNBOUND_SQL_PARAMETER] Found the unbound parameter: `namespace`. Please, fix 
> `args` and provide a mapping of the parameter to a SQL literal.; line 3 pos 
> 38;
> 'WithCTE
> :- 'CTERelationDef 0, false
> :  +- 'SubqueryAlias transitions
> :     +- 'Project [*]
> :        +- 'Filter (namespace#3 = parameter(namespace))
> :           +- SubqueryAlias spark_catalog.default.tbl
> :              +- Relation spark_catalog.default.tbl[namespace#3] parquet
> +- 'Project [*]
>    +- 'SubqueryAlias transitions
>       +- 'CTERelationRef 0, falseorg.apache.spark.sql.AnalysisException: 
> [UNBOUND_SQL_PARAMETER] Found the unbound parameter: `namespace`. Please, fix 
> `args` and provide a mapping of the parameter to a SQL literal.; line 3 pos 
> 38;
> 'WithCTE
> :- 'CTERelationDef 0, false
> :  +- 'SubqueryAlias transitions
> :     +- 'Project [*]
> :        +- 'Filter (namespace#3 = parameter(namespace))
> :           +- SubqueryAlias spark_catalog.default.tbl
> :              +- Relation spark_catalog.default.tbl[namespace#3] parquet
> +- 'Project [*]
>    +- 'SubqueryAlias transitions
>       +- 'CTERelationRef 0, false    at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:52)
>     at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$5(CheckAnalysis.scala:339)
>     at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$5$adapted(CheckAnalysis.scala:244)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42702) Support parameterized CTE

2023-03-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42702:


Assignee: Apache Spark  (was: Max Gekk)

> Support parameterized CTE
> -
>
> Key: SPARK-42702
> URL: https://issues.apache.org/jira/browse/SPARK-42702
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
>
> Support named parameters in named common table expressions (CTE). At the 
> moment, such queries failed:
> {code:java}
> CREATE TABLE tbl(namespace STRING) USING parquet
> INSERT INTO tbl SELECT 'abc'
> WITH transitions AS (
>   SELECT * FROM tbl WHERE namespace = :namespace
> ) SELECT * FROM transitions {code}
> w/ the following error:
> {code:java}
> [UNBOUND_SQL_PARAMETER] Found the unbound parameter: `namespace`. Please, fix 
> `args` and provide a mapping of the parameter to a SQL literal.; line 3 pos 
> 38;
> 'WithCTE
> :- 'CTERelationDef 0, false
> :  +- 'SubqueryAlias transitions
> :     +- 'Project [*]
> :        +- 'Filter (namespace#3 = parameter(namespace))
> :           +- SubqueryAlias spark_catalog.default.tbl
> :              +- Relation spark_catalog.default.tbl[namespace#3] parquet
> +- 'Project [*]
>    +- 'SubqueryAlias transitions
>       +- 'CTERelationRef 0, falseorg.apache.spark.sql.AnalysisException: 
> [UNBOUND_SQL_PARAMETER] Found the unbound parameter: `namespace`. Please, fix 
> `args` and provide a mapping of the parameter to a SQL literal.; line 3 pos 
> 38;
> 'WithCTE
> :- 'CTERelationDef 0, false
> :  +- 'SubqueryAlias transitions
> :     +- 'Project [*]
> :        +- 'Filter (namespace#3 = parameter(namespace))
> :           +- SubqueryAlias spark_catalog.default.tbl
> :              +- Relation spark_catalog.default.tbl[namespace#3] parquet
> +- 'Project [*]
>    +- 'SubqueryAlias transitions
>       +- 'CTERelationRef 0, false    at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:52)
>     at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$5(CheckAnalysis.scala:339)
>     at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$5$adapted(CheckAnalysis.scala:244)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42702) Support parameterized CTE

2023-03-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697794#comment-17697794
 ] 

Apache Spark commented on SPARK-42702:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/40333

> Support parameterized CTE
> -
>
> Key: SPARK-42702
> URL: https://issues.apache.org/jira/browse/SPARK-42702
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> Support named parameters in named common table expressions (CTE). At the 
> moment, such queries failed:
> {code:java}
> CREATE TABLE tbl(namespace STRING) USING parquet
> INSERT INTO tbl SELECT 'abc'
> WITH transitions AS (
>   SELECT * FROM tbl WHERE namespace = :namespace
> ) SELECT * FROM transitions {code}
> w/ the following error:
> {code:java}
> [UNBOUND_SQL_PARAMETER] Found the unbound parameter: `namespace`. Please, fix 
> `args` and provide a mapping of the parameter to a SQL literal.; line 3 pos 
> 38;
> 'WithCTE
> :- 'CTERelationDef 0, false
> :  +- 'SubqueryAlias transitions
> :     +- 'Project [*]
> :        +- 'Filter (namespace#3 = parameter(namespace))
> :           +- SubqueryAlias spark_catalog.default.tbl
> :              +- Relation spark_catalog.default.tbl[namespace#3] parquet
> +- 'Project [*]
>    +- 'SubqueryAlias transitions
>       +- 'CTERelationRef 0, falseorg.apache.spark.sql.AnalysisException: 
> [UNBOUND_SQL_PARAMETER] Found the unbound parameter: `namespace`. Please, fix 
> `args` and provide a mapping of the parameter to a SQL literal.; line 3 pos 
> 38;
> 'WithCTE
> :- 'CTERelationDef 0, false
> :  +- 'SubqueryAlias transitions
> :     +- 'Project [*]
> :        +- 'Filter (namespace#3 = parameter(namespace))
> :           +- SubqueryAlias spark_catalog.default.tbl
> :              +- Relation spark_catalog.default.tbl[namespace#3] parquet
> +- 'Project [*]
>    +- 'SubqueryAlias transitions
>       +- 'CTERelationRef 0, false    at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:52)
>     at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$5(CheckAnalysis.scala:339)
>     at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$5$adapted(CheckAnalysis.scala:244)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42690) Implement CSV/JSON parsing funcions

2023-03-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42690:


Assignee: Apache Spark

> Implement CSV/JSON parsing funcions
> ---
>
> Key: SPARK-42690
> URL: https://issues.apache.org/jira/browse/SPARK-42690
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Apache Spark
>Priority: Major
>
> Implement the following two methods in DataFrameReader:
>  
>  
> {code:java}
> /**
> * Loads a `Dataset[String]` storing JSON objects ( href="http://jsonlines.org/;>JSON Lines
> * text format or newline-delimited JSON) and returns the result as a 
> `DataFrame`.
> *
> * Unless the schema is specified using `schema` function, this function goes 
> through the
> * input once to determine the input schema.
> *
> * @param jsonDataset input Dataset with one JSON object per record
> * @since 3.4.0
> */
> def json(jsonDataset: Dataset[String]): DataFrame
> /**
> * Loads an `Dataset[String]` storing CSV rows and returns the result as a 
> `DataFrame`.
> *
> * If the schema is not specified using `schema` function and `inferSchema` 
> option is enabled,
> * this function goes through the input once to determine the input schema.
> *
> * If the schema is not specified using `schema` function and `inferSchema` 
> option is disabled,
> * it determines the columns as string types and it reads only the first line 
> to determine the
> * names and the number of fields.
> *
> * If the enforceSchema is set to `false`, only the CSV header in the first 
> line is checked
> * to conform specified or inferred schema.
> *
> * @note if `header` option is set to `true` when calling this API, all lines 
> same with
> * the header will be removed if exists.
> *
> * @param csvDataset input Dataset with one CSV row per record
> * @since 3.4.0
> */
> def csv(csvDataset: Dataset[String]): DataFrame
> {code}
>  
> For this we need a new message. We cannot use project because we don't know 
> the schema upfront.
>  
> {code:java}
> message Parse {
>   // (Required) Input relation to Parse. The input is expected to have single 
> text column.
>   Relation input = 1;
>   // (Required) The expected format of the text.
>   ParseFormat format = 2;
>   enum ParseFormat {
> PARSE_FORMAT_UNSPECIFIED = 0;
> PARSE_FORMAT_CSV = 1;
> PARSE_FORMAT_JSON = 2;
>   }
> }
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42690) Implement CSV/JSON parsing funcions

2023-03-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42690:


Assignee: (was: Apache Spark)

> Implement CSV/JSON parsing funcions
> ---
>
> Key: SPARK-42690
> URL: https://issues.apache.org/jira/browse/SPARK-42690
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>
> Implement the following two methods in DataFrameReader:
>  
>  
> {code:java}
> /**
> * Loads a `Dataset[String]` storing JSON objects ( href="http://jsonlines.org/;>JSON Lines
> * text format or newline-delimited JSON) and returns the result as a 
> `DataFrame`.
> *
> * Unless the schema is specified using `schema` function, this function goes 
> through the
> * input once to determine the input schema.
> *
> * @param jsonDataset input Dataset with one JSON object per record
> * @since 3.4.0
> */
> def json(jsonDataset: Dataset[String]): DataFrame
> /**
> * Loads an `Dataset[String]` storing CSV rows and returns the result as a 
> `DataFrame`.
> *
> * If the schema is not specified using `schema` function and `inferSchema` 
> option is enabled,
> * this function goes through the input once to determine the input schema.
> *
> * If the schema is not specified using `schema` function and `inferSchema` 
> option is disabled,
> * it determines the columns as string types and it reads only the first line 
> to determine the
> * names and the number of fields.
> *
> * If the enforceSchema is set to `false`, only the CSV header in the first 
> line is checked
> * to conform specified or inferred schema.
> *
> * @note if `header` option is set to `true` when calling this API, all lines 
> same with
> * the header will be removed if exists.
> *
> * @param csvDataset input Dataset with one CSV row per record
> * @since 3.4.0
> */
> def csv(csvDataset: Dataset[String]): DataFrame
> {code}
>  
> For this we need a new message. We cannot use project because we don't know 
> the schema upfront.
>  
> {code:java}
> message Parse {
>   // (Required) Input relation to Parse. The input is expected to have single 
> text column.
>   Relation input = 1;
>   // (Required) The expected format of the text.
>   ParseFormat format = 2;
>   enum ParseFormat {
> PARSE_FORMAT_UNSPECIFIED = 0;
> PARSE_FORMAT_CSV = 1;
> PARSE_FORMAT_JSON = 2;
>   }
> }
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42690) Implement CSV/JSON parsing funcions

2023-03-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697786#comment-17697786
 ] 

Apache Spark commented on SPARK-42690:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40332

> Implement CSV/JSON parsing funcions
> ---
>
> Key: SPARK-42690
> URL: https://issues.apache.org/jira/browse/SPARK-42690
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>
> Implement the following two methods in DataFrameReader:
>  
>  
> {code:java}
> /**
> * Loads a `Dataset[String]` storing JSON objects ( href="http://jsonlines.org/;>JSON Lines
> * text format or newline-delimited JSON) and returns the result as a 
> `DataFrame`.
> *
> * Unless the schema is specified using `schema` function, this function goes 
> through the
> * input once to determine the input schema.
> *
> * @param jsonDataset input Dataset with one JSON object per record
> * @since 3.4.0
> */
> def json(jsonDataset: Dataset[String]): DataFrame
> /**
> * Loads an `Dataset[String]` storing CSV rows and returns the result as a 
> `DataFrame`.
> *
> * If the schema is not specified using `schema` function and `inferSchema` 
> option is enabled,
> * this function goes through the input once to determine the input schema.
> *
> * If the schema is not specified using `schema` function and `inferSchema` 
> option is disabled,
> * it determines the columns as string types and it reads only the first line 
> to determine the
> * names and the number of fields.
> *
> * If the enforceSchema is set to `false`, only the CSV header in the first 
> line is checked
> * to conform specified or inferred schema.
> *
> * @note if `header` option is set to `true` when calling this API, all lines 
> same with
> * the header will be removed if exists.
> *
> * @param csvDataset input Dataset with one CSV row per record
> * @since 3.4.0
> */
> def csv(csvDataset: Dataset[String]): DataFrame
> {code}
>  
> For this we need a new message. We cannot use project because we don't know 
> the schema upfront.
>  
> {code:java}
> message Parse {
>   // (Required) Input relation to Parse. The input is expected to have single 
> text column.
>   Relation input = 1;
>   // (Required) The expected format of the text.
>   ParseFormat format = 2;
>   enum ParseFormat {
> PARSE_FORMAT_UNSPECIFIED = 0;
> PARSE_FORMAT_CSV = 1;
> PARSE_FORMAT_JSON = 2;
>   }
> }
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42713) Add 'getattr' and 'getitem' of DataFrame and Column to API reference

2023-03-07 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697756#comment-17697756
 ] 

Apache Spark commented on SPARK-42713:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/40331

> Add '__getattr__' and '__getitem__' of DataFrame and Column to API reference
> 
>
> Key: SPARK-42713
> URL: https://issues.apache.org/jira/browse/SPARK-42713
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42713) Add 'getattr' and 'getitem' of DataFrame and Column to API reference

2023-03-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42713:


Assignee: Apache Spark

> Add '__getattr__' and '__getitem__' of DataFrame and Column to API reference
> 
>
> Key: SPARK-42713
> URL: https://issues.apache.org/jira/browse/SPARK-42713
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42713) Add 'getattr' and 'getitem' of DataFrame and Column to API reference

2023-03-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42713:


Assignee: (was: Apache Spark)

> Add '__getattr__' and '__getitem__' of DataFrame and Column to API reference
> 
>
> Key: SPARK-42713
> URL: https://issues.apache.org/jira/browse/SPARK-42713
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42713) Add 'getattr' and 'getitem' of DataFrame and Column to API reference

2023-03-07 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697755#comment-17697755
 ] 

Apache Spark commented on SPARK-42713:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/40331

> Add '__getattr__' and '__getitem__' of DataFrame and Column to API reference
> 
>
> Key: SPARK-42713
> URL: https://issues.apache.org/jira/browse/SPARK-42713
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42712) Improve docstring of mapInPandas and mapInArrow

2023-03-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42712:


Assignee: Apache Spark

> Improve docstring of mapInPandas and mapInArrow
> ---
>
> Key: SPARK-42712
> URL: https://issues.apache.org/jira/browse/SPARK-42712
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Apache Spark
>Priority: Major
>
> We'd better call out they are not scalar.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42712) Improve docstring of mapInPandas and mapInArrow

2023-03-07 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697752#comment-17697752
 ] 

Apache Spark commented on SPARK-42712:
--

User 'xinrong-meng' has created a pull request for this issue:
https://github.com/apache/spark/pull/40330

> Improve docstring of mapInPandas and mapInArrow
> ---
>
> Key: SPARK-42712
> URL: https://issues.apache.org/jira/browse/SPARK-42712
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> We'd better call out they are not scalar.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42712) Improve docstring of mapInPandas and mapInArrow

2023-03-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42712:


Assignee: (was: Apache Spark)

> Improve docstring of mapInPandas and mapInArrow
> ---
>
> Key: SPARK-42712
> URL: https://issues.apache.org/jira/browse/SPARK-42712
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> We'd better call out they are not scalar.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42710) Rename FrameMap proto to MapPartitions

2023-03-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42710:


Assignee: Apache Spark

> Rename FrameMap proto to MapPartitions
> --
>
> Key: SPARK-42710
> URL: https://issues.apache.org/jira/browse/SPARK-42710
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Apache Spark
>Priority: Major
>
> For readability.
> Frame Map API refers to mapInPandas and mapInArrow, which are equivalent to 
> MapPartitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42710) Rename FrameMap proto to MapPartitions

2023-03-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42710:


Assignee: (was: Apache Spark)

> Rename FrameMap proto to MapPartitions
> --
>
> Key: SPARK-42710
> URL: https://issues.apache.org/jira/browse/SPARK-42710
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> For readability.
> Frame Map API refers to mapInPandas and mapInArrow, which are equivalent to 
> MapPartitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42710) Rename FrameMap proto to MapPartitions

2023-03-07 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697748#comment-17697748
 ] 

Apache Spark commented on SPARK-42710:
--

User 'xinrong-meng' has created a pull request for this issue:
https://github.com/apache/spark/pull/40329

> Rename FrameMap proto to MapPartitions
> --
>
> Key: SPARK-42710
> URL: https://issues.apache.org/jira/browse/SPARK-42710
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> For readability.
> Frame Map API refers to mapInPandas and mapInArrow, which are equivalent to 
> MapPartitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42709) Do not rely on file

2023-03-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42709:


Assignee: (was: Apache Spark)

> Do not rely on __file__
> ---
>
> Key: SPARK-42709
> URL: https://issues.apache.org/jira/browse/SPARK-42709
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> We have a lot of places using __file__ which is actually optional. We 
> shouldn't reply on them



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42709) Do not rely on file

2023-03-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42709:


Assignee: Apache Spark

> Do not rely on __file__
> ---
>
> Key: SPARK-42709
> URL: https://issues.apache.org/jira/browse/SPARK-42709
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>
> We have a lot of places using __file__ which is actually optional. We 
> shouldn't reply on them



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42709) Do not rely on file

2023-03-07 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697734#comment-17697734
 ] 

Apache Spark commented on SPARK-42709:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/40328

> Do not rely on __file__
> ---
>
> Key: SPARK-42709
> URL: https://issues.apache.org/jira/browse/SPARK-42709
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> We have a lot of places using __file__ which is actually optional. We 
> shouldn't reply on them



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42266) Local mode should work with IPython

2023-03-07 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697727#comment-17697727
 ] 

Apache Spark commented on SPARK-42266:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/40327

> Local mode should work with IPython
> ---
>
> Key: SPARK-42266
> URL: https://issues.apache.org/jira/browse/SPARK-42266
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> {code:java}
> (spark_dev) ➜  spark git:(master) bin/pyspark --remote "local[*]"
> Python 3.9.15 (main, Nov 24 2022, 08:28:41) 
> Type 'copyright', 'credits' or 'license' for more information
> IPython 8.9.0 -- An enhanced Interactive Python. Type '?' for help.
> /Users/ruifeng.zheng/Dev/spark/python/pyspark/shell.py:45: UserWarning: 
> Failed to initialize Spark session.
>   warnings.warn("Failed to initialize Spark session.")
> Traceback (most recent call last):
>   File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/shell.py", line 40, in 
> 
> spark = SparkSession.builder.getOrCreate()
>   File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/session.py", line 
> 429, in getOrCreate
> from pyspark.sql.connect.session import SparkSession as RemoteSparkSession
>   File 
> "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/__init__.py", line 
> 21, in 
> from pyspark.sql.connect.dataframe import DataFrame  # noqa: F401
>   File 
> "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/dataframe.py", 
> line 35, in 
> import pandas
>   File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/pandas/__init__.py", 
> line 29, in 
> from pyspark.pandas.missing.general_functions import 
> MissingPandasLikeGeneralFunctions
>   File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/pandas/__init__.py", 
> line 34, in 
> require_minimum_pandas_version()
>   File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/pandas/utils.py", 
> line 37, in require_minimum_pandas_version
> if LooseVersion(pandas.__version__) < 
> LooseVersion(minimum_pandas_version):
> AttributeError: partially initialized module 'pandas' has no attribute 
> '__version__' (most likely due to a circular import)
> [TerminalIPythonApp] WARNING | Unknown error in handling PYTHONSTARTUP file 
> /Users/ruifeng.zheng/Dev/spark//python/pyspark/shell.py:
> ---
> AttributeErrorTraceback (most recent call last)
> File ~/Dev/spark/python/pyspark/shell.py:40
>  38 try:
>  39 # Creates pyspark.sql.connect.SparkSession.
> ---> 40 spark = SparkSession.builder.getOrCreate()
>  41 except Exception:
> File ~/Dev/spark/python/pyspark/sql/session.py:429, in 
> SparkSession.Builder.getOrCreate(self)
> 428 with SparkContext._lock:
> --> 429 from pyspark.sql.connect.session import SparkSession as 
> RemoteSparkSession
> 431 if (
> 432 SparkContext._active_spark_context is None
> 433 and SparkSession._instantiatedSession is None
> 434 ):
> File ~/Dev/spark/python/pyspark/sql/connect/__init__.py:21
>  18 """Currently Spark Connect is very experimental and the APIs to 
> interact with
>  19 Spark through this API are can be changed at any time without 
> warning."""
> ---> 21 from pyspark.sql.connect.dataframe import DataFrame  # noqa: F401
>  22 from pyspark.sql.pandas.utils import (
>  23 require_minimum_pandas_version,
>  24 require_minimum_pyarrow_version,
>  25 require_minimum_grpc_version,
>  26 )
> File ~/Dev/spark/python/pyspark/sql/connect/dataframe.py:35
>  34 import random
> ---> 35 import pandas
>  36 import json
> File ~/Dev/spark/python/pyspark/pandas/__init__.py:29
>  27 from typing import Any
> ---> 29 from pyspark.pandas.missing.general_functions import 
> MissingPandasLikeGeneralFunctions
>  30 from pyspark.pandas.missing.scalars import MissingPandasLikeScalars
> File ~/Dev/spark/python/pyspark/pandas/__init__.py:34
>  33 try:
> ---> 34 require_minimum_pandas_version()
>  35 require_minimum_pyarrow_version()
> File ~/Dev/spark/python/pyspark/sql/pandas/utils.py:37, in 
> require_minimum_pandas_version()
>  34 raise ImportError(
>  35 "Pandas >= %s must be installed; however, " "it was not 
> found." % minimum_pandas_version
>  36 ) from raised_error
> ---> 37 if LooseVersion(pandas.__version__) < 
> LooseVersion(minimum_pandas_version):
>  38 raise ImportError(
>  39 "Pandas >= %s must be installed; however, "
>  40 "your version was %s." % (minimum_pandas_version, 
> pandas.__version__)
>  41 )
>

[jira] [Assigned] (SPARK-42266) Local mode should work with IPython

2023-03-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42266:


Assignee: Apache Spark

> Local mode should work with IPython
> ---
>
> Key: SPARK-42266
> URL: https://issues.apache.org/jira/browse/SPARK-42266
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Major
>
> {code:java}
> (spark_dev) ➜  spark git:(master) bin/pyspark --remote "local[*]"
> Python 3.9.15 (main, Nov 24 2022, 08:28:41) 
> Type 'copyright', 'credits' or 'license' for more information
> IPython 8.9.0 -- An enhanced Interactive Python. Type '?' for help.
> /Users/ruifeng.zheng/Dev/spark/python/pyspark/shell.py:45: UserWarning: 
> Failed to initialize Spark session.
>   warnings.warn("Failed to initialize Spark session.")
> Traceback (most recent call last):
>   File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/shell.py", line 40, in 
> 
> spark = SparkSession.builder.getOrCreate()
>   File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/session.py", line 
> 429, in getOrCreate
> from pyspark.sql.connect.session import SparkSession as RemoteSparkSession
>   File 
> "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/__init__.py", line 
> 21, in 
> from pyspark.sql.connect.dataframe import DataFrame  # noqa: F401
>   File 
> "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/dataframe.py", 
> line 35, in 
> import pandas
>   File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/pandas/__init__.py", 
> line 29, in 
> from pyspark.pandas.missing.general_functions import 
> MissingPandasLikeGeneralFunctions
>   File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/pandas/__init__.py", 
> line 34, in 
> require_minimum_pandas_version()
>   File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/pandas/utils.py", 
> line 37, in require_minimum_pandas_version
> if LooseVersion(pandas.__version__) < 
> LooseVersion(minimum_pandas_version):
> AttributeError: partially initialized module 'pandas' has no attribute 
> '__version__' (most likely due to a circular import)
> [TerminalIPythonApp] WARNING | Unknown error in handling PYTHONSTARTUP file 
> /Users/ruifeng.zheng/Dev/spark//python/pyspark/shell.py:
> ---
> AttributeErrorTraceback (most recent call last)
> File ~/Dev/spark/python/pyspark/shell.py:40
>  38 try:
>  39 # Creates pyspark.sql.connect.SparkSession.
> ---> 40 spark = SparkSession.builder.getOrCreate()
>  41 except Exception:
> File ~/Dev/spark/python/pyspark/sql/session.py:429, in 
> SparkSession.Builder.getOrCreate(self)
> 428 with SparkContext._lock:
> --> 429 from pyspark.sql.connect.session import SparkSession as 
> RemoteSparkSession
> 431 if (
> 432 SparkContext._active_spark_context is None
> 433 and SparkSession._instantiatedSession is None
> 434 ):
> File ~/Dev/spark/python/pyspark/sql/connect/__init__.py:21
>  18 """Currently Spark Connect is very experimental and the APIs to 
> interact with
>  19 Spark through this API are can be changed at any time without 
> warning."""
> ---> 21 from pyspark.sql.connect.dataframe import DataFrame  # noqa: F401
>  22 from pyspark.sql.pandas.utils import (
>  23 require_minimum_pandas_version,
>  24 require_minimum_pyarrow_version,
>  25 require_minimum_grpc_version,
>  26 )
> File ~/Dev/spark/python/pyspark/sql/connect/dataframe.py:35
>  34 import random
> ---> 35 import pandas
>  36 import json
> File ~/Dev/spark/python/pyspark/pandas/__init__.py:29
>  27 from typing import Any
> ---> 29 from pyspark.pandas.missing.general_functions import 
> MissingPandasLikeGeneralFunctions
>  30 from pyspark.pandas.missing.scalars import MissingPandasLikeScalars
> File ~/Dev/spark/python/pyspark/pandas/__init__.py:34
>  33 try:
> ---> 34 require_minimum_pandas_version()
>  35 require_minimum_pyarrow_version()
> File ~/Dev/spark/python/pyspark/sql/pandas/utils.py:37, in 
> require_minimum_pandas_version()
>  34 raise ImportError(
>  35 "Pandas >= %s must be installed; however, " "it was not 
> found." % minimum_pandas_version
>  36 ) from raised_error
> ---> 37 if LooseVersion(pandas.__version__) < 
> LooseVersion(minimum_pandas_version):
>  38 raise ImportError(
>  39 "Pandas >= %s must be installed; however, "
>  40 "your version was %s." % (minimum_pandas_version, 
> pandas.__version__)
>  41 )
> AttributeError: partially initialized module 'pandas' has no attribute 
>

[jira] [Assigned] (SPARK-42266) Local mode should work with IPython

2023-03-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42266:


Assignee: (was: Apache Spark)

> Local mode should work with IPython
> ---
>
> Key: SPARK-42266
> URL: https://issues.apache.org/jira/browse/SPARK-42266
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> {code:java}
> (spark_dev) ➜  spark git:(master) bin/pyspark --remote "local[*]"
> Python 3.9.15 (main, Nov 24 2022, 08:28:41) 
> Type 'copyright', 'credits' or 'license' for more information
> IPython 8.9.0 -- An enhanced Interactive Python. Type '?' for help.
> /Users/ruifeng.zheng/Dev/spark/python/pyspark/shell.py:45: UserWarning: 
> Failed to initialize Spark session.
>   warnings.warn("Failed to initialize Spark session.")
> Traceback (most recent call last):
>   File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/shell.py", line 40, in 
> 
> spark = SparkSession.builder.getOrCreate()
>   File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/session.py", line 
> 429, in getOrCreate
> from pyspark.sql.connect.session import SparkSession as RemoteSparkSession
>   File 
> "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/__init__.py", line 
> 21, in 
> from pyspark.sql.connect.dataframe import DataFrame  # noqa: F401
>   File 
> "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/dataframe.py", 
> line 35, in 
> import pandas
>   File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/pandas/__init__.py", 
> line 29, in 
> from pyspark.pandas.missing.general_functions import 
> MissingPandasLikeGeneralFunctions
>   File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/pandas/__init__.py", 
> line 34, in 
> require_minimum_pandas_version()
>   File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/pandas/utils.py", 
> line 37, in require_minimum_pandas_version
> if LooseVersion(pandas.__version__) < 
> LooseVersion(minimum_pandas_version):
> AttributeError: partially initialized module 'pandas' has no attribute 
> '__version__' (most likely due to a circular import)
> [TerminalIPythonApp] WARNING | Unknown error in handling PYTHONSTARTUP file 
> /Users/ruifeng.zheng/Dev/spark//python/pyspark/shell.py:
> ---
> AttributeErrorTraceback (most recent call last)
> File ~/Dev/spark/python/pyspark/shell.py:40
>  38 try:
>  39 # Creates pyspark.sql.connect.SparkSession.
> ---> 40 spark = SparkSession.builder.getOrCreate()
>  41 except Exception:
> File ~/Dev/spark/python/pyspark/sql/session.py:429, in 
> SparkSession.Builder.getOrCreate(self)
> 428 with SparkContext._lock:
> --> 429 from pyspark.sql.connect.session import SparkSession as 
> RemoteSparkSession
> 431 if (
> 432 SparkContext._active_spark_context is None
> 433 and SparkSession._instantiatedSession is None
> 434 ):
> File ~/Dev/spark/python/pyspark/sql/connect/__init__.py:21
>  18 """Currently Spark Connect is very experimental and the APIs to 
> interact with
>  19 Spark through this API are can be changed at any time without 
> warning."""
> ---> 21 from pyspark.sql.connect.dataframe import DataFrame  # noqa: F401
>  22 from pyspark.sql.pandas.utils import (
>  23 require_minimum_pandas_version,
>  24 require_minimum_pyarrow_version,
>  25 require_minimum_grpc_version,
>  26 )
> File ~/Dev/spark/python/pyspark/sql/connect/dataframe.py:35
>  34 import random
> ---> 35 import pandas
>  36 import json
> File ~/Dev/spark/python/pyspark/pandas/__init__.py:29
>  27 from typing import Any
> ---> 29 from pyspark.pandas.missing.general_functions import 
> MissingPandasLikeGeneralFunctions
>  30 from pyspark.pandas.missing.scalars import MissingPandasLikeScalars
> File ~/Dev/spark/python/pyspark/pandas/__init__.py:34
>  33 try:
> ---> 34 require_minimum_pandas_version()
>  35 require_minimum_pyarrow_version()
> File ~/Dev/spark/python/pyspark/sql/pandas/utils.py:37, in 
> require_minimum_pandas_version()
>  34 raise ImportError(
>  35 "Pandas >= %s must be installed; however, " "it was not 
> found." % minimum_pandas_version
>  36 ) from raised_error
> ---> 37 if LooseVersion(pandas.__version__) < 
> LooseVersion(minimum_pandas_version):
>  38 raise ImportError(
>  39 "Pandas >= %s must be installed; however, "
>  40 "your version was %s." % (minimum_pandas_version, 
> pandas.__version__)
>  41 )
> AttributeError: partially initialized module 'pandas' has no attribute 
> '__version__' (most likely

[jira] [Assigned] (SPARK-42708) The generated protobuf java file is too large

2023-03-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42708:


Assignee: (was: Apache Spark)

> The generated protobuf java file is too large
> -
>
> Key: SPARK-42708
> URL: https://issues.apache.org/jira/browse/SPARK-42708
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.5.0
>Reporter: Jia Fan
>Priority: Trivial
>
> Our project used generated protobuf java file too large so can't be index by 
> IDEA. So I can't run program with IDEA. The way to fix this is change IDEA  
> idea.max.intellisense.filesize value to 1. I can't find how to fix this 
> in project README before I google. So I want to tell other new guys in Spark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42708) The generated protobuf java file is too large

2023-03-07 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697719#comment-17697719
 ] 

Apache Spark commented on SPARK-42708:
--

User 'Hisoka-X' has created a pull request for this issue:
https://github.com/apache/spark/pull/40326

> The generated protobuf java file is too large
> -
>
> Key: SPARK-42708
> URL: https://issues.apache.org/jira/browse/SPARK-42708
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.5.0
>Reporter: Jia Fan
>Priority: Trivial
>
> Our project used generated protobuf java file too large so can't be index by 
> IDEA. So I can't run program with IDEA. The way to fix this is change IDEA  
> idea.max.intellisense.filesize value to 1. I can't find how to fix this 
> in project README before I google. So I want to tell other new guys in Spark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42708) The generated protobuf java file is too large

2023-03-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42708:


Assignee: Apache Spark

> The generated protobuf java file is too large
> -
>
> Key: SPARK-42708
> URL: https://issues.apache.org/jira/browse/SPARK-42708
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.5.0
>Reporter: Jia Fan
>Assignee: Apache Spark
>Priority: Trivial
>
> Our project used generated protobuf java file too large so can't be index by 
> IDEA. So I can't run program with IDEA. The way to fix this is change IDEA  
> idea.max.intellisense.filesize value to 1. I can't find how to fix this 
> in project README before I google. So I want to tell other new guys in Spark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42707) Remove experimental warning in developer documentation

2023-03-07 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697717#comment-17697717
 ] 

Apache Spark commented on SPARK-42707:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/40325

> Remove experimental warning in developer documentation
> --
>
> Key: SPARK-42707
> URL: https://issues.apache.org/jira/browse/SPARK-42707
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> {code}
> diff --git a/connector/connect/README.md b/connector/connect/README.md
> index 6567daf5504..dfe49cea3df 100644
> --- a/connector/connect/README.md
> +++ b/connector/connect/README.md
> @@ -1,8 +1,5 @@
>  # Spark Connect
> -**Spark Connect is a strictly experimental feature and under heavy 
> development.
> -All APIs should be considered volatile and should not be used in 
> production.**
> -
>  This module contains the implementation of Spark Connect which is a logical 
> plan
>  facade for the implementation in Spark. Spark Connect is directly integrated 
> into the build
>  of Spark.
> diff --git a/python/pyspark/sql/connect/__init__.py 
> b/python/pyspark/sql/connect/__init__.py
> index 9bd4513db22..8b5d30e214c 100644
> --- a/python/pyspark/sql/connect/__init__.py
> +++ b/python/pyspark/sql/connect/__init__.py
> @@ -15,5 +15,4 @@
>  # limitations under the License.
>  #
> -"""Currently Spark Connect is very experimental and the APIs to interact with
> -Spark through this API are can be changed at any time without warning."""
> +"""Spark Connect cleint"""
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42707) Remove experimental warning in developer documentation

2023-03-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42707:


Assignee: Apache Spark

> Remove experimental warning in developer documentation
> --
>
> Key: SPARK-42707
> URL: https://issues.apache.org/jira/browse/SPARK-42707
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>
> {code}
> diff --git a/connector/connect/README.md b/connector/connect/README.md
> index 6567daf5504..dfe49cea3df 100644
> --- a/connector/connect/README.md
> +++ b/connector/connect/README.md
> @@ -1,8 +1,5 @@
>  # Spark Connect
> -**Spark Connect is a strictly experimental feature and under heavy 
> development.
> -All APIs should be considered volatile and should not be used in 
> production.**
> -
>  This module contains the implementation of Spark Connect which is a logical 
> plan
>  facade for the implementation in Spark. Spark Connect is directly integrated 
> into the build
>  of Spark.
> diff --git a/python/pyspark/sql/connect/__init__.py 
> b/python/pyspark/sql/connect/__init__.py
> index 9bd4513db22..8b5d30e214c 100644
> --- a/python/pyspark/sql/connect/__init__.py
> +++ b/python/pyspark/sql/connect/__init__.py
> @@ -15,5 +15,4 @@
>  # limitations under the License.
>  #
> -"""Currently Spark Connect is very experimental and the APIs to interact with
> -Spark through this API are can be changed at any time without warning."""
> +"""Spark Connect cleint"""
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42707) Remove experimental warning in developer documentation

2023-03-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42707:


Assignee: (was: Apache Spark)

> Remove experimental warning in developer documentation
> --
>
> Key: SPARK-42707
> URL: https://issues.apache.org/jira/browse/SPARK-42707
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> {code}
> diff --git a/connector/connect/README.md b/connector/connect/README.md
> index 6567daf5504..dfe49cea3df 100644
> --- a/connector/connect/README.md
> +++ b/connector/connect/README.md
> @@ -1,8 +1,5 @@
>  # Spark Connect
> -**Spark Connect is a strictly experimental feature and under heavy 
> development.
> -All APIs should be considered volatile and should not be used in 
> production.**
> -
>  This module contains the implementation of Spark Connect which is a logical 
> plan
>  facade for the implementation in Spark. Spark Connect is directly integrated 
> into the build
>  of Spark.
> diff --git a/python/pyspark/sql/connect/__init__.py 
> b/python/pyspark/sql/connect/__init__.py
> index 9bd4513db22..8b5d30e214c 100644
> --- a/python/pyspark/sql/connect/__init__.py
> +++ b/python/pyspark/sql/connect/__init__.py
> @@ -15,5 +15,4 @@
>  # limitations under the License.
>  #
> -"""Currently Spark Connect is very experimental and the APIs to interact with
> -Spark through this API are can be changed at any time without warning."""
> +"""Spark Connect cleint"""
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42496) Introduction Spark Connect at main page.

2023-03-07 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697699#comment-17697699
 ] 

Apache Spark commented on SPARK-42496:
--

User 'allanf-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/40324

> Introduction Spark Connect at main page.
> 
>
> Key: SPARK-42496
> URL: https://issues.apache.org/jira/browse/SPARK-42496
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Documentation
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We should document the introduction of Spark Connect at PySpark main 
> documentation page to give a summary to users.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42705) SparkSession.sql doesn't return values from commands.

2023-03-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42705:


Assignee: Apache Spark

> SparkSession.sql doesn't return values from commands.
> -
>
> Key: SPARK-42705
> URL: https://issues.apache.org/jira/browse/SPARK-42705
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Assignee: Apache Spark
>Priority: Major
>
> {code:python}
> >>> spark.sql("show functions").show()
> ++
> |function|
> ++
> ++
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42705) SparkSession.sql doesn't return values from commands.

2023-03-07 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697660#comment-17697660
 ] 

Apache Spark commented on SPARK-42705:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40323

> SparkSession.sql doesn't return values from commands.
> -
>
> Key: SPARK-42705
> URL: https://issues.apache.org/jira/browse/SPARK-42705
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> {code:python}
> >>> spark.sql("show functions").show()
> ++
> |function|
> ++
> ++
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42705) SparkSession.sql doesn't return values from commands.

2023-03-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42705:


Assignee: (was: Apache Spark)

> SparkSession.sql doesn't return values from commands.
> -
>
> Key: SPARK-42705
> URL: https://issues.apache.org/jira/browse/SPARK-42705
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> {code:python}
> >>> spark.sql("show functions").show()
> ++
> |function|
> ++
> ++
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41775) Implement training functions as input

2023-03-07 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697658#comment-17697658
 ] 

Apache Spark commented on SPARK-41775:
--

User 'rithwik-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/40322

> Implement training functions as input
> -
>
> Key: SPARK-41775
> URL: https://issues.apache.org/jira/browse/SPARK-41775
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 3.4.0
>Reporter: Rithwik Ediga Lakhamsani
>Assignee: Rithwik Ediga Lakhamsani
>Priority: Major
> Fix For: 3.4.0
>
>
> Sidenote: make formatting updates described in 
> https://github.com/apache/spark/pull/39188
>  
> Currently, `Distributor().run(...)` takes only files as input. Now we will 
> add in additional functionality to take in functions as well. This will 
> require us to go through the following process on each task in the executor 
> nodes:
> 1. take the input function and args and pickle them
> 2. Create a temp train.py file that looks like
> {code:java}
> import cloudpickle
> import os
> if _name_ == "_main_":
>     train, args = cloudpickle.load(f"{tempdir}/train_input.pkl")
>     output = train(*args)
>     if output and os.environ.get("RANK", "") == "0": # this is for 
> partitionId == 0
>         cloudpickle.dump(f"{tempdir}/train_output.pkl") {code}
> 3. Run that train.py file with `torchrun`
> 4. Check if `train_output.pkl` has been created on process on partitionId == 
> 0, if it has, then deserialize it and return that output through `.collect()`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42704) SubqueryAlias should propagate metadata columns its child already selects

2023-03-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42704:


Assignee: Apache Spark

> SubqueryAlias should propagate metadata columns its child already selects 
> --
>
> Key: SPARK-42704
> URL: https://issues.apache.org/jira/browse/SPARK-42704
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Ryan Johnson
>Assignee: Apache Spark
>Priority: Major
>
> The `AddMetadataColumns` analyzer rule intends to make resolve available 
> metadata columns, even if the plan already contains projections that did not 
> explicitly mention the metadata column.
> The `SubqueryAlias` plan node intentionally does not propagate metadata 
> columns automatically from a non-leaf/non-subquery child node, because the 
> following should _not_ work:
>  
> {code:java}
> spark.read.table("t").select("a", "b").as("s").select("_metadata"){code}
> However, today it is too strict in breaks the metadata chain, in case the 
> child node's output already includes the metadata column:
>  
> {code:java}
> // expected to work (and does)
> spark.read.table("t")
>   .select("a", "b").select("_metadata")
> // by extension, should also work (but does not)
> spark.read.table("t").select("a", "b", "_metadata").as("s")
>   .select("a", "b").select("_metadata"){code}
> The solution is for `SubqueryAlias` to always propagate metadata columns that 
> are already in the child's output, thus preserving the `metadataOutput` chain 
> for that column.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42704) SubqueryAlias should propagate metadata columns its child already selects

2023-03-07 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697565#comment-17697565
 ] 

Apache Spark commented on SPARK-42704:
--

User 'ryan-johnson-databricks' has created a pull request for this issue:
https://github.com/apache/spark/pull/40321

> SubqueryAlias should propagate metadata columns its child already selects 
> --
>
> Key: SPARK-42704
> URL: https://issues.apache.org/jira/browse/SPARK-42704
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Ryan Johnson
>Priority: Major
>
> The `AddMetadataColumns` analyzer rule intends to make resolve available 
> metadata columns, even if the plan already contains projections that did not 
> explicitly mention the metadata column.
> The `SubqueryAlias` plan node intentionally does not propagate metadata 
> columns automatically from a non-leaf/non-subquery child node, because the 
> following should _not_ work:
>  
> {code:java}
> spark.read.table("t").select("a", "b").as("s").select("_metadata"){code}
> However, today it is too strict in breaks the metadata chain, in case the 
> child node's output already includes the metadata column:
>  
> {code:java}
> // expected to work (and does)
> spark.read.table("t")
>   .select("a", "b").select("_metadata")
> // by extension, should also work (but does not)
> spark.read.table("t").select("a", "b", "_metadata").as("s")
>   .select("a", "b").select("_metadata"){code}
> The solution is for `SubqueryAlias` to always propagate metadata columns that 
> are already in the child's output, thus preserving the `metadataOutput` chain 
> for that column.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42704) SubqueryAlias should propagate metadata columns its child already selects

2023-03-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42704:


Assignee: (was: Apache Spark)

> SubqueryAlias should propagate metadata columns its child already selects 
> --
>
> Key: SPARK-42704
> URL: https://issues.apache.org/jira/browse/SPARK-42704
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Ryan Johnson
>Priority: Major
>
> The `AddMetadataColumns` analyzer rule intends to make resolve available 
> metadata columns, even if the plan already contains projections that did not 
> explicitly mention the metadata column.
> The `SubqueryAlias` plan node intentionally does not propagate metadata 
> columns automatically from a non-leaf/non-subquery child node, because the 
> following should _not_ work:
>  
> {code:java}
> spark.read.table("t").select("a", "b").as("s").select("_metadata"){code}
> However, today it is too strict in breaks the metadata chain, in case the 
> child node's output already includes the metadata column:
>  
> {code:java}
> // expected to work (and does)
> spark.read.table("t")
>   .select("a", "b").select("_metadata")
> // by extension, should also work (but does not)
> spark.read.table("t").select("a", "b", "_metadata").as("s")
>   .select("a", "b").select("_metadata"){code}
> The solution is for `SubqueryAlias` to always propagate metadata columns that 
> are already in the child's output, thus preserving the `metadataOutput` chain 
> for that column.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42692) Implement Dataset.toJson

2023-03-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42692:


Assignee: Apache Spark

> Implement Dataset.toJson
> 
>
> Key: SPARK-42692
> URL: https://issues.apache.org/jira/browse/SPARK-42692
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Apache Spark
>Priority: Major
>
> Implement Dataset.toJSON:
>  
> {code:java}
> /**
> * Returns the content of the Dataset as a Dataset of JSON strings.
> * @since 3.4.0
> */
> def toJSON: Dataset[String]{code}
>  
> Please see if we can implement this using 
> {{{}project(to_json(struct(*))).as(StringEncoder){}}}.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42692) Implement Dataset.toJson

2023-03-07 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697467#comment-17697467
 ] 

Apache Spark commented on SPARK-42692:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40319

> Implement Dataset.toJson
> 
>
> Key: SPARK-42692
> URL: https://issues.apache.org/jira/browse/SPARK-42692
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>
> Implement Dataset.toJSON:
>  
> {code:java}
> /**
> * Returns the content of the Dataset as a Dataset of JSON strings.
> * @since 3.4.0
> */
> def toJSON: Dataset[String]{code}
>  
> Please see if we can implement this using 
> {{{}project(to_json(struct(*))).as(StringEncoder){}}}.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42692) Implement Dataset.toJson

2023-03-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42692:


Assignee: (was: Apache Spark)

> Implement Dataset.toJson
> 
>
> Key: SPARK-42692
> URL: https://issues.apache.org/jira/browse/SPARK-42692
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>
> Implement Dataset.toJSON:
>  
> {code:java}
> /**
> * Returns the content of the Dataset as a Dataset of JSON strings.
> * @since 3.4.0
> */
> def toJSON: Dataset[String]{code}
>  
> Please see if we can implement this using 
> {{{}project(to_json(struct(*))).as(StringEncoder){}}}.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42656) Spark Connect Scala Client Shell Script

2023-03-07 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697423#comment-17697423
 ] 

Apache Spark commented on SPARK-42656:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40318

> Spark Connect Scala Client Shell Script
> ---
>
> Key: SPARK-42656
> URL: https://issues.apache.org/jira/browse/SPARK-42656
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Assignee: Zhen Li
>Priority: Major
> Fix For: 3.4.0
>
>
> Adding a shell script to run scala client in a scala REPL to allow users to 
> connect to spark connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42700) Add h2 as test dependency of connect-server module

2023-03-07 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697417#comment-17697417
 ] 

Apache Spark commented on SPARK-42700:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40317

> Add h2 as test dependency of connect-server module
> --
>
> Key: SPARK-42700
> URL: https://issues.apache.org/jira/browse/SPARK-42700
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>
> run 
>  # mvn clean install -DskipTests -pl connector/connect/server -am
>  # mvn test -pl connector/connect/server
> {code:java}
> *** RUN ABORTED ***
>   java.lang.ClassNotFoundException: org.h2.Driver
>   at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)
>   at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:589)
>   at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
>   at java.base/java.lang.Class.forName0(Native Method)
>   at java.base/java.lang.Class.forName(Class.java:398)
>   at org.apache.spark.util.Utils$.classForName(Utils.scala:225)
>   at 
> org.apache.spark.sql.connect.ProtoToParsedPlanTestSuite.beforeAll(ProtoToParsedPlanTestSuite.scala:68)
>   at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
>   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
>   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
>   ...
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42700) Add h2 as test dependency of connect-server module

2023-03-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42700:


Assignee: (was: Apache Spark)

> Add h2 as test dependency of connect-server module
> --
>
> Key: SPARK-42700
> URL: https://issues.apache.org/jira/browse/SPARK-42700
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>
> run 
>  # mvn clean install -DskipTests -pl connector/connect/server -am
>  # mvn test -pl connector/connect/server
> {code:java}
> *** RUN ABORTED ***
>   java.lang.ClassNotFoundException: org.h2.Driver
>   at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)
>   at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:589)
>   at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
>   at java.base/java.lang.Class.forName0(Native Method)
>   at java.base/java.lang.Class.forName(Class.java:398)
>   at org.apache.spark.util.Utils$.classForName(Utils.scala:225)
>   at 
> org.apache.spark.sql.connect.ProtoToParsedPlanTestSuite.beforeAll(ProtoToParsedPlanTestSuite.scala:68)
>   at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
>   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
>   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
>   ...
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42700) Add h2 as test dependency of connect-server module

2023-03-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42700:


Assignee: Apache Spark

> Add h2 as test dependency of connect-server module
> --
>
> Key: SPARK-42700
> URL: https://issues.apache.org/jira/browse/SPARK-42700
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>
> run 
>  # mvn clean install -DskipTests -pl connector/connect/server -am
>  # mvn test -pl connector/connect/server
> {code:java}
> *** RUN ABORTED ***
>   java.lang.ClassNotFoundException: org.h2.Driver
>   at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)
>   at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:589)
>   at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
>   at java.base/java.lang.Class.forName0(Native Method)
>   at java.base/java.lang.Class.forName(Class.java:398)
>   at org.apache.spark.util.Utils$.classForName(Utils.scala:225)
>   at 
> org.apache.spark.sql.connect.ProtoToParsedPlanTestSuite.beforeAll(ProtoToParsedPlanTestSuite.scala:68)
>   at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
>   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
>   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
>   ...
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42679) createDataFrame doesn't work with non-nullable schema.

2023-03-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42679:


Assignee: Apache Spark

> createDataFrame doesn't work with non-nullable schema.
> --
>
> Key: SPARK-42679
> URL: https://issues.apache.org/jira/browse/SPARK-42679
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>
> spark.createDataFrame won't work with non-nullable schema as below:
> {code:java}
> from pyspark.sql.types import *
> schema_false = StructType([StructField("id", IntegerType(), False)])
> spark.createDataFrame([[1]], schema=schema_false)
> Traceback (most recent call last):
> ...
> pyspark.errors.exceptions.connect.AnalysisException: 
> [NULLABLE_COLUMN_OR_FIELD] Column or field `id` is nullable while it's 
> required to be non-nullable.{code}
> whereas it works fine with nullable schema:
> {code:java}
> schema_true = StructType([StructField("id", IntegerType(), True)])
> spark.createDataFrame([[1]], schema=schema_true)
> DataFrame[id: int]{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42679) createDataFrame doesn't work with non-nullable schema.

2023-03-07 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697412#comment-17697412
 ] 

Apache Spark commented on SPARK-42679:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40316

> createDataFrame doesn't work with non-nullable schema.
> --
>
> Key: SPARK-42679
> URL: https://issues.apache.org/jira/browse/SPARK-42679
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> spark.createDataFrame won't work with non-nullable schema as below:
> {code:java}
> from pyspark.sql.types import *
> schema_false = StructType([StructField("id", IntegerType(), False)])
> spark.createDataFrame([[1]], schema=schema_false)
> Traceback (most recent call last):
> ...
> pyspark.errors.exceptions.connect.AnalysisException: 
> [NULLABLE_COLUMN_OR_FIELD] Column or field `id` is nullable while it's 
> required to be non-nullable.{code}
> whereas it works fine with nullable schema:
> {code:java}
> schema_true = StructType([StructField("id", IntegerType(), True)])
> spark.createDataFrame([[1]], schema=schema_true)
> DataFrame[id: int]{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42679) createDataFrame doesn't work with non-nullable schema.

2023-03-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42679:


Assignee: (was: Apache Spark)

> createDataFrame doesn't work with non-nullable schema.
> --
>
> Key: SPARK-42679
> URL: https://issues.apache.org/jira/browse/SPARK-42679
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> spark.createDataFrame won't work with non-nullable schema as below:
> {code:java}
> from pyspark.sql.types import *
> schema_false = StructType([StructField("id", IntegerType(), False)])
> spark.createDataFrame([[1]], schema=schema_false)
> Traceback (most recent call last):
> ...
> pyspark.errors.exceptions.connect.AnalysisException: 
> [NULLABLE_COLUMN_OR_FIELD] Column or field `id` is nullable while it's 
> required to be non-nullable.{code}
> whereas it works fine with nullable schema:
> {code:java}
> schema_true = StructType([StructField("id", IntegerType(), True)])
> spark.createDataFrame([[1]], schema=schema_true)
> DataFrame[id: int]{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42699) SparkConnectServer should make client and AM same exit code

2023-03-07 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697373#comment-17697373
 ] 

Apache Spark commented on SPARK-42699:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/40315

> SparkConnectServer should make client and AM same exit code
> ---
>
> Key: SPARK-42699
> URL: https://issues.apache.org/jira/browse/SPARK-42699
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Spark Core
>Affects Versions: 3.5.0
>Reporter: angerszhu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42699) SparkConnectServer should make client and AM same exit code

2023-03-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42699:


Assignee: Apache Spark

> SparkConnectServer should make client and AM same exit code
> ---
>
> Key: SPARK-42699
> URL: https://issues.apache.org/jira/browse/SPARK-42699
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Spark Core
>Affects Versions: 3.5.0
>Reporter: angerszhu
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42699) SparkConnectServer should make client and AM same exit code

2023-03-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42699:


Assignee: (was: Apache Spark)

> SparkConnectServer should make client and AM same exit code
> ---
>
> Key: SPARK-42699
> URL: https://issues.apache.org/jira/browse/SPARK-42699
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Spark Core
>Affects Versions: 3.5.0
>Reporter: angerszhu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42698) Client mode submit task client should keep same exitcode with AM

2023-03-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42698:


Assignee: (was: Apache Spark)

> Client mode submit task client should keep same exitcode with AM
> 
>
> Key: SPARK-42698
> URL: https://issues.apache.org/jira/browse/SPARK-42698
> Project: Spark
>  Issue Type: Sub-task
>  Components: YARN
>Affects Versions: 3.5.0
>Reporter: angerszhu
>Priority: Major
>
> ```
> try {
>   app.start(childArgs.toArray, sparkConf)
> } catch {
>   case t: Throwable =>
> throw findCause(t)
> } finally {
>   if (!isShell(args.primaryResource) && !isSqlShell(args.mainClass) &&
> !isThriftServer(args.mainClass)) {
> try {
>   SparkContext.getActive.foreach(_.stop())
> } catch {
>   case e: Throwable => logError(s"Failed to close SparkContext: $e")
> }
>   }
> }
>   }
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42698) Client mode submit task client should keep same exitcode with AM

2023-03-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42698:


Assignee: Apache Spark

> Client mode submit task client should keep same exitcode with AM
> 
>
> Key: SPARK-42698
> URL: https://issues.apache.org/jira/browse/SPARK-42698
> Project: Spark
>  Issue Type: Sub-task
>  Components: YARN
>Affects Versions: 3.5.0
>Reporter: angerszhu
>Assignee: Apache Spark
>Priority: Major
>
> ```
> try {
>   app.start(childArgs.toArray, sparkConf)
> } catch {
>   case t: Throwable =>
> throw findCause(t)
> } finally {
>   if (!isShell(args.primaryResource) && !isSqlShell(args.mainClass) &&
> !isThriftServer(args.mainClass)) {
> try {
>   SparkContext.getActive.foreach(_.stop())
> } catch {
>   case e: Throwable => logError(s"Failed to close SparkContext: $e")
> }
>   }
> }
>   }
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42698) Client mode submit task client should keep same exitcode with AM

2023-03-07 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697364#comment-17697364
 ] 

Apache Spark commented on SPARK-42698:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/40314

> Client mode submit task client should keep same exitcode with AM
> 
>
> Key: SPARK-42698
> URL: https://issues.apache.org/jira/browse/SPARK-42698
> Project: Spark
>  Issue Type: Sub-task
>  Components: YARN
>Affects Versions: 3.5.0
>Reporter: angerszhu
>Priority: Major
>
> ```
> try {
>   app.start(childArgs.toArray, sparkConf)
> } catch {
>   case t: Throwable =>
> throw findCause(t)
> } finally {
>   if (!isShell(args.primaryResource) && !isSqlShell(args.mainClass) &&
> !isThriftServer(args.mainClass)) {
> try {
>   SparkContext.getActive.foreach(_.stop())
> } catch {
>   case e: Throwable => logError(s"Failed to close SparkContext: $e")
> }
>   }
> }
>   }
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42698) Client mode submit task client should keep same exitcode with AM

2023-03-07 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697365#comment-17697365
 ] 

Apache Spark commented on SPARK-42698:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/40314

> Client mode submit task client should keep same exitcode with AM
> 
>
> Key: SPARK-42698
> URL: https://issues.apache.org/jira/browse/SPARK-42698
> Project: Spark
>  Issue Type: Sub-task
>  Components: YARN
>Affects Versions: 3.5.0
>Reporter: angerszhu
>Priority: Major
>
> ```
> try {
>   app.start(childArgs.toArray, sparkConf)
> } catch {
>   case t: Throwable =>
> throw findCause(t)
> } finally {
>   if (!isShell(args.primaryResource) && !isSqlShell(args.mainClass) &&
> !isThriftServer(args.mainClass)) {
> try {
>   SparkContext.getActive.foreach(_.stop())
> } catch {
>   case e: Throwable => logError(s"Failed to close SparkContext: $e")
> }
>   }
> }
>   }
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42697) /api/v1/applications return 0 for duration

2023-03-07 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697356#comment-17697356
 ] 

Apache Spark commented on SPARK-42697:
--

User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/40313

> /api/v1/applications return 0 for duration
> --
>
> Key: SPARK-42697
> URL: https://issues.apache.org/jira/browse/SPARK-42697
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.1.3, 3.2.3, 3.3.2, 3.4.0
>Reporter: Kent Yao
>Priority: Major
>
> which should be total uptime



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42697) /api/v1/applications return 0 for duration

2023-03-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42697:


Assignee: (was: Apache Spark)

> /api/v1/applications return 0 for duration
> --
>
> Key: SPARK-42697
> URL: https://issues.apache.org/jira/browse/SPARK-42697
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.1.3, 3.2.3, 3.3.2, 3.4.0
>Reporter: Kent Yao
>Priority: Major
>
> which should be total uptime



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42697) /api/v1/applications return 0 for duration

2023-03-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42697:


Assignee: Apache Spark

> /api/v1/applications return 0 for duration
> --
>
> Key: SPARK-42697
> URL: https://issues.apache.org/jira/browse/SPARK-42697
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.1.3, 3.2.3, 3.3.2, 3.4.0
>Reporter: Kent Yao
>Assignee: Apache Spark
>Priority: Major
>
> which should be total uptime



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42695) Skew join handling in stream side of broadcast hash join

2023-03-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42695:


Assignee: Apache Spark

> Skew join handling in stream side of broadcast hash join
> 
>
> Key: SPARK-42695
> URL: https://issues.apache.org/jira/browse/SPARK-42695
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Xingchao, Zhang
>Assignee: Apache Spark
>Priority: Major
> Attachments: before-01.png
>
>
> We can extended the current  OptimizeSkewedJoin if data skew detected in 
> stream side of broadcast hash join
>  
> !before-01.png|width=609,height=626!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42695) Skew join handling in stream side of broadcast hash join

2023-03-07 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697306#comment-17697306
 ] 

Apache Spark commented on SPARK-42695:
--

User 'xingchaozh' has created a pull request for this issue:
https://github.com/apache/spark/pull/40312

> Skew join handling in stream side of broadcast hash join
> 
>
> Key: SPARK-42695
> URL: https://issues.apache.org/jira/browse/SPARK-42695
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Xingchao, Zhang
>Priority: Major
> Attachments: before-01.png
>
>
> We can extended the current  OptimizeSkewedJoin if data skew detected in 
> stream side of broadcast hash join
>  
> !before-01.png|width=609,height=626!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42695) Skew join handling in stream side of broadcast hash join

2023-03-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42695:


Assignee: (was: Apache Spark)

> Skew join handling in stream side of broadcast hash join
> 
>
> Key: SPARK-42695
> URL: https://issues.apache.org/jira/browse/SPARK-42695
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Xingchao, Zhang
>Priority: Major
> Attachments: before-01.png
>
>
> We can extended the current  OptimizeSkewedJoin if data skew detected in 
> stream side of broadcast hash join
>  
> !before-01.png|width=609,height=626!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42559) Implement DataFrameNaFunctions

2023-03-06 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697249#comment-17697249
 ] 

Apache Spark commented on SPARK-42559:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/40311

> Implement DataFrameNaFunctions
> --
>
> Key: SPARK-42559
> URL: https://issues.apache.org/jira/browse/SPARK-42559
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: BingKun Pan
>Priority: Major
> Fix For: 3.4.1
>
>
> Implement DataFrameNaFunctions for connect and hook it up to Dataset.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42022) createDataFrame should autogenerate missing column names

2023-03-06 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697188#comment-17697188
 ] 

Apache Spark commented on SPARK-42022:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40310

> createDataFrame should autogenerate missing column names
> 
>
> Key: SPARK-42022
> URL: https://issues.apache.org/jira/browse/SPARK-42022
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> {code}
> pyspark/sql/tests/test_types.py:233 
> (TypesParityTests.test_infer_schema_not_enough_names)
> ['col1', '_2'] != ['col1']
> Expected :['col1']
> Actual   :['col1', '_2']
> 
> self =  testMethod=test_infer_schema_not_enough_names>
> def test_infer_schema_not_enough_names(self):
> df = self.spark.createDataFrame([["a", "b"]], ["col1"])
> >   self.assertEqual(df.columns, ["col1", "_2"])
> ../test_types.py:236: AssertionError
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42022) createDataFrame should autogenerate missing column names

2023-03-06 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42022:


Assignee: (was: Apache Spark)

> createDataFrame should autogenerate missing column names
> 
>
> Key: SPARK-42022
> URL: https://issues.apache.org/jira/browse/SPARK-42022
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> {code}
> pyspark/sql/tests/test_types.py:233 
> (TypesParityTests.test_infer_schema_not_enough_names)
> ['col1', '_2'] != ['col1']
> Expected :['col1']
> Actual   :['col1', '_2']
> 
> self =  testMethod=test_infer_schema_not_enough_names>
> def test_infer_schema_not_enough_names(self):
> df = self.spark.createDataFrame([["a", "b"]], ["col1"])
> >   self.assertEqual(df.columns, ["col1", "_2"])
> ../test_types.py:236: AssertionError
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42022) createDataFrame should autogenerate missing column names

2023-03-06 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42022:


Assignee: Apache Spark

> createDataFrame should autogenerate missing column names
> 
>
> Key: SPARK-42022
> URL: https://issues.apache.org/jira/browse/SPARK-42022
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>
> {code}
> pyspark/sql/tests/test_types.py:233 
> (TypesParityTests.test_infer_schema_not_enough_names)
> ['col1', '_2'] != ['col1']
> Expected :['col1']
> Actual   :['col1', '_2']
> 
> self =  testMethod=test_infer_schema_not_enough_names>
> def test_infer_schema_not_enough_names(self):
> df = self.spark.createDataFrame([["a", "b"]], ["col1"])
> >   self.assertEqual(df.columns, ["col1", "_2"])
> ../test_types.py:236: AssertionError
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42688) Rename Connect proto Request client_id to session_id

2023-03-06 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42688:


Assignee: (was: Apache Spark)

> Rename Connect proto Request client_id to session_id
> 
>
> Key: SPARK-42688
> URL: https://issues.apache.org/jira/browse/SPARK-42688
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42688) Rename Connect proto Request client_id to session_id

2023-03-06 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697163#comment-17697163
 ] 

Apache Spark commented on SPARK-42688:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/40309

> Rename Connect proto Request client_id to session_id
> 
>
> Key: SPARK-42688
> URL: https://issues.apache.org/jira/browse/SPARK-42688
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42688) Rename Connect proto Request client_id to session_id

2023-03-06 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42688:


Assignee: Apache Spark

> Rename Connect proto Request client_id to session_id
> 
>
> Key: SPARK-42688
> URL: https://issues.apache.org/jira/browse/SPARK-42688
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42656) Spark Connect Scala Client Shell Script

2023-03-06 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697162#comment-17697162
 ] 

Apache Spark commented on SPARK-42656:
--

User 'zhenlineo' has created a pull request for this issue:
https://github.com/apache/spark/pull/40305

> Spark Connect Scala Client Shell Script
> ---
>
> Key: SPARK-42656
> URL: https://issues.apache.org/jira/browse/SPARK-42656
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Assignee: Zhen Li
>Priority: Major
> Fix For: 3.4.0
>
>
> Adding a shell script to run scala client in a scala REPL to allow users to 
> connect to spark connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42151) Align UPDATE assignments with table attributes

2023-03-06 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42151:


Assignee: (was: Apache Spark)

> Align UPDATE assignments with table attributes
> --
>
> Key: SPARK-42151
> URL: https://issues.apache.org/jira/browse/SPARK-42151
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Anton Okolnychyi
>Priority: Major
>
> Assignment in UPDATE commands should be aligned with table attributes prior 
> to rewriting those UPDATE commands.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42151) Align UPDATE assignments with table attributes

2023-03-06 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42151:


Assignee: Apache Spark

> Align UPDATE assignments with table attributes
> --
>
> Key: SPARK-42151
> URL: https://issues.apache.org/jira/browse/SPARK-42151
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Anton Okolnychyi
>Assignee: Apache Spark
>Priority: Major
>
> Assignment in UPDATE commands should be aligned with table attributes prior 
> to rewriting those UPDATE commands.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

< 1 2 3 4 5 6 7 8 9 10 >

501 - 600 of 86669 matches

Mail list logo