date:20240208

[jira] [Assigned] (SPARK-46641) Add maxBytesPerTrigger threshold option

2024-02-08 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-46641:
-

Assignee: Maksim Konstantinov

> Add maxBytesPerTrigger threshold option
> ---
>
> Key: SPARK-46641
> URL: https://issues.apache.org/jira/browse/SPARK-46641
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Maksim Konstantinov
>Assignee: Maksim Konstantinov
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46641) Add maxBytesPerTrigger threshold option

2024-02-08 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46641.
---
Resolution: Fixed

Issue resolved by pull request 44636
[https://github.com/apache/spark/pull/44636]

> Add maxBytesPerTrigger threshold option
> ---
>
> Key: SPARK-46641
> URL: https://issues.apache.org/jira/browse/SPARK-46641
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Maksim Konstantinov
>Assignee: Maksim Konstantinov
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35303) Enable pinned thread mode by default

2024-02-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-35303:
---
Labels: pull-request-available  (was: )

> Enable pinned thread mode by default
> 
>
> Key: SPARK-35303
> URL: https://issues.apache.org/jira/browse/SPARK-35303
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.0
>
>
> Pinned thread mode was added at SPARK-22340. We should enable it back to map 
> Python thread to JVM thread in order to prevent potential issues such as 
> thread local inheritance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35946) Respect Py4J server if InheritableThread API

2024-02-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-35946:
---
Labels: pull-request-available  (was: )

> Respect Py4J server if InheritableThread API
> 
>
> Key: SPARK-35946
> URL: https://issues.apache.org/jira/browse/SPARK-35946
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.0
>
>
> Currently we sets the enviornment variables at the client side of Py4J 
> (python/pyspark/util.py ). If the Py4J gateway is created somewhere else 
> (e.g., Zeppelin, etc), it could introduce a breakage at:
> {code}
> from pyspark import SparkContext
> jvm = SparkContext._jvm
> thread_connection = jvm._gateway_client.get_thread_connection()
> # ^ the MLlibMLflowIntegrationSuite test suite failed at this line
> # `AttributeError: 'GatewayClient' object has no attribute 
> 'get_thread_connection'`
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-47014) Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession

2024-02-08 Thread Hudson (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-47014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17815891#comment-17815891
 ] 

Hudson commented on SPARK-47014:


User 'xinrong-meng' has created a pull request for this issue:
https://github.com/apache/spark/pull/45073

> Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession
> -
>
> Key: SPARK-47014
> URL: https://issues.apache.org/jira/browse/SPARK-47014
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47014) Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession

2024-02-08 Thread Xinrong Meng (Jira)

Xinrong Meng created SPARK-47014:


 Summary: Implement methods dumpPerfProfiles and dumpMemoryProfiles 
of SparkSession
 Key: SPARK-47014
 URL: https://issues.apache.org/jira/browse/SPARK-47014
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Xinrong Meng


Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47013) Document the config spark.sql.streaming.minBatchesToRetain

2024-02-08 Thread Lingeshwaran Radhakrishnan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lingeshwaran Radhakrishnan updated SPARK-47013:
---
Summary: Document the config spark.sql.streaming.minBatchesToRetain  (was: 
Add the config spark.sql.streaming.minBatchesToRetain to the docs)

> Document the config spark.sql.streaming.minBatchesToRetain
> --
>
> Key: SPARK-47013
> URL: https://issues.apache.org/jira/browse/SPARK-47013
> Project: Spark
>  Issue Type: Documentation
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Lingeshwaran Radhakrishnan
>Priority: Major
>
> Add the config spark.sql.streaming.minBatchesToRetain to the [streaming 
> docs|https://spark.apache.org/docs/latest/configuration.html#spark-streaming] 
> page which basically controls the minimum number of batches that must be 
> retained and made recoverable.
> This would also help control the lifecycle of the state files held in the 
> checkpoint folder i.e, State files are cleaned up based on the config 
> spark.sql.streaming.minBatchesToRetain



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47013) Document the config spark.sql.streaming.minBatchesToRetain

2024-02-08 Thread Lingeshwaran Radhakrishnan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lingeshwaran Radhakrishnan updated SPARK-47013:
---
Priority: Minor  (was: Major)

> Document the config spark.sql.streaming.minBatchesToRetain
> --
>
> Key: SPARK-47013
> URL: https://issues.apache.org/jira/browse/SPARK-47013
> Project: Spark
>  Issue Type: Documentation
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Lingeshwaran Radhakrishnan
>Priority: Minor
>
> Add the config spark.sql.streaming.minBatchesToRetain to the [streaming 
> docs|https://spark.apache.org/docs/latest/configuration.html#spark-streaming] 
> page which basically controls the minimum number of batches that must be 
> retained and made recoverable.
> This would also help control the lifecycle of the state files held in the 
> checkpoint folder i.e, State files are cleaned up based on the config 
> spark.sql.streaming.minBatchesToRetain



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47013) Add the config spark.sql.streaming.minBatchesToRetain to the docs

2024-02-08 Thread Lingeshwaran Radhakrishnan (Jira)

Lingeshwaran Radhakrishnan created SPARK-47013:
--

 Summary: Add the config spark.sql.streaming.minBatchesToRetain to 
the docs
 Key: SPARK-47013
 URL: https://issues.apache.org/jira/browse/SPARK-47013
 Project: Spark
  Issue Type: Documentation
  Components: Structured Streaming
Affects Versions: 3.5.0
Reporter: Lingeshwaran Radhakrishnan


Add the config spark.sql.streaming.minBatchesToRetain to the [streaming 
docs|https://spark.apache.org/docs/latest/configuration.html#spark-streaming] 
page which basically controls the minimum number of batches that must be 
retained and made recoverable.

This would also help control the lifecycle of the state files held in the 
checkpoint folder i.e, State files are cleaned up based on the config 
spark.sql.streaming.minBatchesToRetain



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47012) Built-in SQL Function Support - Collate

2024-02-08 Thread Aleksandar Tomic (Jira)

Aleksandar Tomic created SPARK-47012:


 Summary: Built-in SQL Function Support - Collate
 Key: SPARK-47012
 URL: https://issues.apache.org/jira/browse/SPARK-47012
 Project: Spark
  Issue Type: Task
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Aleksandar Tomic






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47002) Enforce that 'AnalyzeResult' 'orderBy' field is a list of pyspark.sql.functions.OrderingColumn

2024-02-08 Thread Takuya Ueshin (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-47002.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45062
[https://github.com/apache/spark/pull/45062]

> Enforce that 'AnalyzeResult' 'orderBy' field is a list of 
> pyspark.sql.functions.OrderingColumn
> --
>
> Key: SPARK-47002
> URL: https://issues.apache.org/jira/browse/SPARK-47002
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Daniel
>Assignee: Daniel
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47002) Enforce that 'AnalyzeResult' 'orderBy' field is a list of pyspark.sql.functions.OrderingColumn

2024-02-08 Thread Takuya Ueshin (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin reassigned SPARK-47002:
-

Assignee: Daniel

> Enforce that 'AnalyzeResult' 'orderBy' field is a list of 
> pyspark.sql.functions.OrderingColumn
> --
>
> Key: SPARK-47002
> URL: https://issues.apache.org/jira/browse/SPARK-47002
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Daniel
>Assignee: Daniel
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47011) Remove deprecated `BinaryClassificationMetrics.scoreLabelsWeight`

2024-02-08 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47011:
-

Assignee: Dongjoon Hyun

> Remove deprecated `BinaryClassificationMetrics.scoreLabelsWeight`
> -
>
> Key: SPARK-47011
> URL: https://issues.apache.org/jira/browse/SPARK-47011
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47011) Remove deprecated `BinaryClassificationMetrics.scoreLabelsWeight`

2024-02-08 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47011.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45070
[https://github.com/apache/spark/pull/45070]

> Remove deprecated `BinaryClassificationMetrics.scoreLabelsWeight`
> -
>
> Key: SPARK-47011
> URL: https://issues.apache.org/jira/browse/SPARK-47011
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-46690) Support profiling on FlatMapCoGroupsInBatchExec

2024-02-08 Thread Xinrong Meng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng reassigned SPARK-46690:


Assignee: Xinrong Meng

> Support profiling on FlatMapCoGroupsInBatchExec
> ---
>
> Key: SPARK-46690
> URL: https://issues.apache.org/jira/browse/SPARK-46690
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Takuya Ueshin
>Assignee: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46690) Support profiling on FlatMapCoGroupsInBatchExec

2024-02-08 Thread Xinrong Meng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng resolved SPARK-46690.
--
Resolution: Done

Resolved by https://github.com/apache/spark/pull/45050

> Support profiling on FlatMapCoGroupsInBatchExec
> ---
>
> Key: SPARK-46690
> URL: https://issues.apache.org/jira/browse/SPARK-46690
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Takuya Ueshin
>Assignee: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46689) Support profiling on FlatMapGroupsInBatchExec

2024-02-08 Thread Xinrong Meng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng resolved SPARK-46689.
--
Resolution: Done

Resolved by https://github.com/apache/spark/pull/45050

> Support profiling on FlatMapGroupsInBatchExec
> -
>
> Key: SPARK-46689
> URL: https://issues.apache.org/jira/browse/SPARK-46689
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Takuya Ueshin
>Assignee: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-46689) Support profiling on FlatMapGroupsInBatchExec

2024-02-08 Thread Xinrong Meng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng reassigned SPARK-46689:


Assignee: Xinrong Meng

> Support profiling on FlatMapGroupsInBatchExec
> -
>
> Key: SPARK-46689
> URL: https://issues.apache.org/jira/browse/SPARK-46689
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Takuya Ueshin
>Assignee: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47011) Remove deprecated `BinaryClassificationMetrics.scoreLabelsWeight`

2024-02-08 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47011:
--
Parent: SPARK-44111
Issue Type: Sub-task  (was: Task)

> Remove deprecated `BinaryClassificationMetrics.scoreLabelsWeight`
> -
>
> Key: SPARK-47011
> URL: https://issues.apache.org/jira/browse/SPARK-47011
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47011) Remove deprecated `BinaryClassificationMetrics.scoreLabelsWeight`

2024-02-08 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-47011:
-

 Summary: Remove deprecated 
`BinaryClassificationMetrics.scoreLabelsWeight`
 Key: SPARK-47011
 URL: https://issues.apache.org/jira/browse/SPARK-47011
 Project: Spark
  Issue Type: Task
  Components: MLlib
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47011) Remove deprecated `BinaryClassificationMetrics.scoreLabelsWeight`

2024-02-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47011:
---
Labels: pull-request-available  (was: )

> Remove deprecated `BinaryClassificationMetrics.scoreLabelsWeight`
> -
>
> Key: SPARK-47011
> URL: https://issues.apache.org/jira/browse/SPARK-47011
> Project: Spark
>  Issue Type: Task
>  Components: MLlib
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47010) Kubernetes: support csi driver for volume type

2024-02-08 Thread Oleg Frenkel (Jira)

Oleg Frenkel created SPARK-47010:


 Summary: Kubernetes: support csi driver for volume type
 Key: SPARK-47010
 URL: https://issues.apache.org/jira/browse/SPARK-47010
 Project: Spark
  Issue Type: New Feature
  Components: Kubernetes
Affects Versions: 3.5.0
Reporter: Oleg Frenkel


Today Spark supports the following types of Kubernetes 
[volumes|https://kubernetes.io/docs/concepts/storage/volumes/]: hostPath, 
emptyDir, nfs and persistentVolumeClaim.

In our case, Kubernetes cluster is multi-tenant and we cannot make cluster-wide 
changes when deploying our application to the Kubernetes cluster. Our 
application requires static shared file system. So, we cannot use hostPath 
(don't have control of hosting VMs) and persistentVolumeClaim (requires 
cluster-wide change when deploying PV). Our security department does not allow 
nfs. 

What would help in our case, is the use of csi driver (taken from here: 
https://github.com/kubernetes-sigs/azurefile-csi-driver/blob/master/deploy/example/e2e_usage.md#option3-inline-volume):
{code:java}
kind: Pod
apiVersion: v1
metadata:
  name: nginx-azurefile-inline-volume
spec:
  nodeSelector:
"kubernetes.io/os": linux
  containers:
- image: mcr.microsoft.com/oss/nginx/nginx:1.19.5
  name: nginx-azurefile
  command:
- "/bin/bash"
- "-c"
- set -euo pipefail; while true; do echo $(date) >> 
/mnt/azurefile/outfile; sleep 1; done
  volumeMounts:
- name: persistent-storage
  mountPath: "/mnt/azurefile"
  readOnly: false
  volumes:
- name: persistent-storage
  csi:
driver: file.csi.azure.com
volumeAttributes:
  shareName: EXISTING_SHARE_NAME  # required
  secretName: azure-secret  # required
  mountOptions: 
"dir_mode=0777,file_mode=0777,cache=strict,actimeo=30,nosharesock"  # optional 
{code}
 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47009) Create table with collation

2024-02-08 Thread Stefan Kandic (Jira)

Stefan Kandic created SPARK-47009:
-

 Summary: Create table with collation
 Key: SPARK-47009
 URL: https://issues.apache.org/jira/browse/SPARK-47009
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Stefan Kandic


Add support for creating table with columns containing non-default collated data



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47008) Spark to support S3 Express One Zone Storage

2024-02-08 Thread Steve Loughran (Jira)

Steve Loughran created SPARK-47008:
--

 Summary: Spark to support S3 Express One Zone Storage
 Key: SPARK-47008
 URL: https://issues.apache.org/jira/browse/SPARK-47008
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 3.5.1
Reporter: Steve Loughran


Hadoop 3.4.0 adds support for AWS S3 Express One Zone Storage.

Most of this is transparent. However, one aspect which can surface as an issue 
is that these stores report prefixes in a listing when there are pending 
uploads, *even when there are no files underneath*

This leads to a situation where a listStatus of a path returns a list of file 
status entries which appears to contain one or more directories -but a 
listStatus on that path raises a FileNotFoundException: there is nothing there.

HADOOP-18996 handles this in all of hadoop code, including FileInputFormat, 

A filesystem can now be probed for inconsistent directoriy listings through 
{{fs.hasPathCapability(path, "fs.capability.directory.listing.inconsistent")}}

If true, then treewalking code SHOULD NOT report a failure if, when walking 
into a subdirectory, a list/getFileStatus on that directory raises a 
FileNotFoundException.

Although most of this is handled in the hadoop code, but there some places 
where treewalking is done inside spark These need to be identified and make 
resilient to failure on the recurse down the tree

* SparkHadoopUtil list methods , 
* especially listLeafStatuses used by OrcFileOperator
org.apache.spark.util.Utils#fetchHcfsFile

{{org.apache.hadoop.fs.FileUtil.maybeIgnoreMissingDirectory()}} can assist 
here, or the logic can be replicated. Using the hadoop implementation would be 
better from a maintenance perspective




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47007) Add SortMap function

2024-02-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47007:
---
Labels: pull-request-available  (was: )

> Add SortMap function
> 
>
> Key: SPARK-47007
> URL: https://issues.apache.org/jira/browse/SPARK-47007
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Stefan Kandic
>Priority: Major
>  Labels: pull-request-available
>
> In order to properly support GROUP BY on a map type we need to first add the 
> ability to sort the map in order to do the comparisons later



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-39910) DataFrameReader API cannot read files from hadoop archives (.har)

2024-02-08 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-39910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-39910:
---

Assignee: Christophe Préaud

> DataFrameReader API cannot read files from hadoop archives (.har)
> -
>
> Key: SPARK-39910
> URL: https://issues.apache.org/jira/browse/SPARK-39910
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.3, 3.1.3, 3.3.0, 3.2.2
>Reporter: Christophe Préaud
>Assignee: Christophe Préaud
>Priority: Minor
>  Labels: DataFrameReader, pull-request-available
>
> Reading a file from an hadoop archive using the DataFrameReader API returns 
> an empty Dataset:
> {code:java}
> scala> val df = 
> spark.read.textFile("har:///user/preaudc/logs/lead/jp/2022/202207.har/20220719")
> df: org.apache.spark.sql.Dataset[String] = [value: string]
> scala> df.count
> res7: Long = 0 {code}
>  
> On the other hand, reading the same file, from the same hadoop archive, but 
> using the RDD API yields the correct result:
> {code:java}
> scala> val df = 
> sc.textFile("har:///user/preaudc/logs/lead/jp/2022/202207.har/20220719").toDF("value")
> df: org.apache.spark.sql.DataFrame = [value: string]
> scala> df.count
> res8: Long = 5589 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-39910) DataFrameReader API cannot read files from hadoop archives (.har)

2024-02-08 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-39910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-39910.
-
Fix Version/s: 3.5.1
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 43463
[https://github.com/apache/spark/pull/43463]

> DataFrameReader API cannot read files from hadoop archives (.har)
> -
>
> Key: SPARK-39910
> URL: https://issues.apache.org/jira/browse/SPARK-39910
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.3, 3.1.3, 3.3.0, 3.2.2
>Reporter: Christophe Préaud
>Assignee: Christophe Préaud
>Priority: Minor
>  Labels: DataFrameReader, pull-request-available
> Fix For: 3.5.1, 4.0.0
>
>
> Reading a file from an hadoop archive using the DataFrameReader API returns 
> an empty Dataset:
> {code:java}
> scala> val df = 
> spark.read.textFile("har:///user/preaudc/logs/lead/jp/2022/202207.har/20220719")
> df: org.apache.spark.sql.Dataset[String] = [value: string]
> scala> df.count
> res7: Long = 0 {code}
>  
> On the other hand, reading the same file, from the same hadoop archive, but 
> using the RDD API yields the correct result:
> {code:java}
> scala> val df = 
> sc.textFile("har:///user/preaudc/logs/lead/jp/2022/202207.har/20220719").toDF("value")
> df: org.apache.spark.sql.DataFrame = [value: string]
> scala> df.count
> res8: Long = 5589 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47007) Add SortMap function

2024-02-08 Thread Stefan Kandic (Jira)

Stefan Kandic created SPARK-47007:
-

 Summary: Add SortMap function
 Key: SPARK-47007
 URL: https://issues.apache.org/jira/browse/SPARK-47007
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Stefan Kandic


In order to properly support GROUP BY on a map type we need to first add the 
ability to sort the map in order to do the comparisons later



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46999) ExpressionWithUnresolvedIdentifier should include other expressions in the expression tree

2024-02-08 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-46999.
-
Fix Version/s: 4.0.0
 Assignee: Wenchen Fan
   Resolution: Fixed

> ExpressionWithUnresolvedIdentifier should include other expressions in the 
> expression tree
> --
>
> Key: SPARK-46999
> URL: https://issues.apache.org/jira/browse/SPARK-46999
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46993) Allow session variables in more places such as from_json for schema

2024-02-08 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-46993.
-
Fix Version/s: 4.0.0
 Assignee: Serge Rielau
   Resolution: Fixed

> Allow session variables in more places such as from_json for schema
> ---
>
> Key: SPARK-46993
> URL: https://issues.apache.org/jira/browse/SPARK-46993
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.2
>Reporter: Serge Rielau
>Assignee: Serge Rielau
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> It appears we do not allow session variables to provide a schema for 
> from_json().
> This is likely a generic restriction re constant folding.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47006) Refactor refill() method to isExhausted() in NioBufferedFileInputStream

2024-02-08 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-47006:
-
Description: Currently, in NioBufferedFileInputStream, the refill() method 
is always invoked in a negated context (!refill()), which can be confusing and 
counter-intuitive. We can refactor the method so that it's no longer necessary 
to invert the result of the method call.

> Refactor refill() method to isExhausted() in NioBufferedFileInputStream
> ---
>
> Key: SPARK-47006
> URL: https://issues.apache.org/jira/browse/SPARK-47006
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Minor
>
> Currently, in NioBufferedFileInputStream, the refill() method is always 
> invoked in a negated context (!refill()), which can be confusing and 
> counter-intuitive. We can refactor the method so that it's no longer 
> necessary to invert the result of the method call.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47006) Refactor refill() method to isExhausted() in NioBufferedFileInputStream

2024-02-08 Thread Yang Jie (Jira)

Yang Jie created SPARK-47006:


 Summary: Refactor refill() method to isExhausted() in 
NioBufferedFileInputStream
 Key: SPARK-47006
 URL: https://issues.apache.org/jira/browse/SPARK-47006
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-46641) Add maxBytesPerTrigger threshold option

[jira] [Resolved] (SPARK-46641) Add maxBytesPerTrigger threshold option

[jira] [Updated] (SPARK-35303) Enable pinned thread mode by default

[jira] [Updated] (SPARK-35946) Respect Py4J server if InheritableThread API

[jira] [Commented] (SPARK-47014) Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession

[jira] [Created] (SPARK-47014) Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession

[jira] [Updated] (SPARK-47013) Document the config spark.sql.streaming.minBatchesToRetain

[jira] [Updated] (SPARK-47013) Document the config spark.sql.streaming.minBatchesToRetain

[jira] [Created] (SPARK-47013) Add the config spark.sql.streaming.minBatchesToRetain to the docs

[jira] [Created] (SPARK-47012) Built-in SQL Function Support - Collate

[jira] [Resolved] (SPARK-47002) Enforce that 'AnalyzeResult' 'orderBy' field is a list of pyspark.sql.functions.OrderingColumn

[jira] [Assigned] (SPARK-47002) Enforce that 'AnalyzeResult' 'orderBy' field is a list of pyspark.sql.functions.OrderingColumn

[jira] [Assigned] (SPARK-47011) Remove deprecated `BinaryClassificationMetrics.scoreLabelsWeight`

[jira] [Resolved] (SPARK-47011) Remove deprecated `BinaryClassificationMetrics.scoreLabelsWeight`

[jira] [Assigned] (SPARK-46690) Support profiling on FlatMapCoGroupsInBatchExec

[jira] [Resolved] (SPARK-46690) Support profiling on FlatMapCoGroupsInBatchExec

[jira] [Resolved] (SPARK-46689) Support profiling on FlatMapGroupsInBatchExec

[jira] [Assigned] (SPARK-46689) Support profiling on FlatMapGroupsInBatchExec

[jira] [Updated] (SPARK-47011) Remove deprecated `BinaryClassificationMetrics.scoreLabelsWeight`

[jira] [Created] (SPARK-47011) Remove deprecated `BinaryClassificationMetrics.scoreLabelsWeight`

[jira] [Updated] (SPARK-47011) Remove deprecated `BinaryClassificationMetrics.scoreLabelsWeight`

[jira] [Created] (SPARK-47010) Kubernetes: support csi driver for volume type

[jira] [Created] (SPARK-47009) Create table with collation

[jira] [Created] (SPARK-47008) Spark to support S3 Express One Zone Storage

[jira] [Updated] (SPARK-47007) Add SortMap function

[jira] [Assigned] (SPARK-39910) DataFrameReader API cannot read files from hadoop archives (.har)

[jira] [Resolved] (SPARK-39910) DataFrameReader API cannot read files from hadoop archives (.har)

[jira] [Created] (SPARK-47007) Add SortMap function

[jira] [Resolved] (SPARK-46999) ExpressionWithUnresolvedIdentifier should include other expressions in the expression tree

[jira] [Resolved] (SPARK-46993) Allow session variables in more places such as from_json for schema

[jira] [Updated] (SPARK-47006) Refactor refill() method to isExhausted() in NioBufferedFileInputStream

[jira] [Created] (SPARK-47006) Refactor refill() method to isExhausted() in NioBufferedFileInputStream

32 matches

Site Navigation

Mail list logo

Footer information