from:"Khalid Mammadov \(Jira\)"

[jira] [Created] (SPARK-45782) Add Dataframe API df.explainString()

2023-11-03 Thread Khalid Mammadov (Jira)

Khalid Mammadov created SPARK-45782:
---

 Summary: Add Dataframe API df.explainString()
 Key: SPARK-45782
 URL: https://issues.apache.org/jira/browse/SPARK-45782
 Project: Spark
  Issue Type: Improvement
  Components: Connect, PySpark, Spark Core
Affects Versions: 4.0.0
Reporter: Khalid Mammadov


This frequently needed feature for performance optimization purposes. Users 
often want to look into this output in running systems and so would like to 
save/extract this output from running systems for later analysis.

Current API only provided for Scala i.e. 

{{df.queryExecution.toString()}}

and also not located in intuitive place where average Spark user (i.e. non 
Expert/Scala dev) can see immediately.

It will also avoid users using workarounds and capturing outputs with
{code:java}
with io.StringIO() as buf ...`:
df.explain(True)
{code}
 

So, it would help users a lot have this output avalilable as:

df.explainString()

i.e. next to

df.explain()

so users can easily locate it and use.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45716) Python parity method StructType.treeString

2023-10-28 Thread Khalid Mammadov (Jira)

Khalid Mammadov created SPARK-45716:
---

 Summary: Python parity method StructType.treeString
 Key: SPARK-45716
 URL: https://issues.apache.org/jira/browse/SPARK-45716
 Project: Spark
  Issue Type: Improvement
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Khalid Mammadov


Add missing parity megthod from Scala to Python



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-43243) Add Level param to df.printSchema for Python API

2023-04-23 Thread Khalid Mammadov (Jira)

Khalid Mammadov created SPARK-43243:
---

 Summary: Add Level param to df.printSchema for Python API
 Key: SPARK-43243
 URL: https://issues.apache.org/jira/browse/SPARK-43243
 Project: Spark
  Issue Type: Improvement
  Components: Connect, PySpark
Affects Versions: 3.5.0
Reporter: Khalid Mammadov


Python printSchema in DataFrame API is missing level parameter which is 
available in Scala API. This is to add that



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-42437) Pyspark catalog.cacheTable allow to specify storage level Connect add support Storagelevel

2023-03-17 Thread Khalid Mammadov (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khalid Mammadov updated SPARK-42437:

 Target Version/s:   (was: 3.5.0)
Affects Version/s: 3.4.0
   (was: 3.5.0)
  Summary: Pyspark catalog.cacheTable allow to specify storage 
level Connect add support Storagelevel  (was: Pyspark catalog.cacheTable allow 
to specify storage level)

> Pyspark catalog.cacheTable allow to specify storage level Connect add support 
> Storagelevel
> --
>
> Key: SPARK-42437
> URL: https://issues.apache.org/jira/browse/SPARK-42437
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Khalid Mammadov
>Priority: Major
>
> Currently PySpark version of catalog.cacheTable function does not support to 
> specify storage level. This is to add that.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-42437) Pyspark catalog.cacheTable allow to specify storage level

2023-02-14 Thread Khalid Mammadov (Jira)

Khalid Mammadov created SPARK-42437:
---

 Summary: Pyspark catalog.cacheTable allow to specify storage level
 Key: SPARK-42437
 URL: https://issues.apache.org/jira/browse/SPARK-42437
 Project: Spark
  Issue Type: Improvement
  Components: Connect, PySpark
Affects Versions: 3.5.0
Reporter: Khalid Mammadov


Currently PySpark version of catalog.cacheTable function does not support to 
specify storage level. This is to add that.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-42400) Code clean up in org.apache.spark.storage

2023-02-10 Thread Khalid Mammadov (Jira)

Khalid Mammadov created SPARK-42400:
---

 Summary: Code clean up in org.apache.spark.storage
 Key: SPARK-42400
 URL: https://issues.apache.org/jira/browse/SPARK-42400
 Project: Spark
  Issue Type: Improvement
  Components: Block Manager
Affects Versions: 3.4.0
Reporter: Khalid Mammadov






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-42257) Remove unused variable in ExternalSorter

2023-01-31 Thread Khalid Mammadov (Jira)

Khalid Mammadov created SPARK-42257:
---

 Summary: Remove unused variable in ExternalSorter 
 Key: SPARK-42257
 URL: https://issues.apache.org/jira/browse/SPARK-42257
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Khalid Mammadov


nextPartitionId variable is not used anywhere  
{color:#ffc66d}writePartitionedMapOutput method.
{color}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37946) Use error classes in the execution errors related to partitions

2022-10-31 Thread Khalid Mammadov (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626866#comment-17626866
 ] 

Khalid Mammadov commented on SPARK-37946:
-

Hi [~maxgekk], I see this one is not done yet here: 
partitionColumnNotFoundInSchemaError

Can I look into it?

Also, there are some more waiting to be done in QueryExecutionErrors.scala e.g.

stateNotDefinedOrAlreadyRemovedError

cannotSetTimeoutDurationError

cannotGetEventTimeWatermarkError

cannotSetTimeoutTimestampError

batchMetadataFileNotFoundError



Shall I look into these as well?

> Use error classes in the execution errors related to partitions
> ---
>
> Key: SPARK-37946
> URL: https://issues.apache.org/jira/browse/SPARK-37946
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Priority: Major
>
> Migrate the following errors in QueryExecutionErrors:
> * unableToDeletePartitionPathError
> * unableToCreatePartitionPathError
> * unableToRenamePartitionPathError
> * notADatasourceRDDPartitionError
> * cannotClearPartitionDirectoryError
> * failedToCastValueToDataTypeForPartitionColumnError
> * unsupportedPartitionTransformError
> * cannotCreateJDBCTableWithPartitionsError
> * requestedPartitionsMismatchTablePartitionsError
> * dynamicPartitionKeyNotAmongWrittenPartitionPathsError
> * cannotRemovePartitionDirError
> * alterTableWithDropPartitionAndPurgeUnsupportedError
> * invalidPartitionFilterError
> * getPartitionMetadataByFilterError
> * illegalLocationClauseForViewPartitionError
> * partitionColumnNotFoundInSchemaError
> * cannotAddMultiPartitionsOnNonatomicPartitionTableError
> * cannotDropMultiPartitionsOnNonatomicPartitionTableError
> * truncateMultiPartitionUnsupportedError
> * dynamicPartitionOverwriteUnsupportedByTableError
> * writePartitionExceedConfigSizeWhenDynamicPartitionError
> onto use error classes. Throw an implementation of SparkThrowable. Also write 
> a test per every error in QueryExecutionErrorsSuite.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-37945) Use error classes in the execution errors of arithmetic ops

2022-10-15 Thread Khalid Mammadov (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17618038#comment-17618038
 ] 

Khalid Mammadov edited comment on SPARK-37945 at 10/15/22 8:43 AM:
---

I was [about to finish |https://github.com/apache/spark/pull/38266] this but I 
can see you set these error classes to _LEGACY which I don't know background so 
will leave this to you then [~maxgekk] 

Please let me know if there is one I can work on


was (Author: JIRAUSER284054):
I was [about to finish |https://github.com/apache/spark/pull/38266] this but I 
can see you set these error classes to _LEGACY which I don't know background so 
will leave this to you then [~maxgekk] 

> Use error classes in the execution errors of arithmetic ops
> ---
>
> Key: SPARK-37945
> URL: https://issues.apache.org/jira/browse/SPARK-37945
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Priority: Major
>
> Migrate the following errors in QueryExecutionErrors:
> * overflowInSumOfDecimalError
> * overflowInIntegralDivideError
> * arithmeticOverflowError
> * unaryMinusCauseOverflowError
> * binaryArithmeticCauseOverflowError
> * unscaledValueTooLargeForPrecisionError
> * decimalPrecisionExceedsMaxPrecisionError
> * outOfDecimalTypeRangeError
> * integerOverflowError
> onto use error classes. Throw an implementation of SparkThrowable. Also write 
> a test per every error in QueryExecutionErrorsSuite.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37945) Use error classes in the execution errors of arithmetic ops

2022-10-15 Thread Khalid Mammadov (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17618038#comment-17618038
 ] 

Khalid Mammadov commented on SPARK-37945:
-

I was [about to finish |https://github.com/apache/spark/pull/38266] this but I 
can see you set these error classes to _LEGACY which I don't know background so 
will leave this to you then [~maxgekk] 

> Use error classes in the execution errors of arithmetic ops
> ---
>
> Key: SPARK-37945
> URL: https://issues.apache.org/jira/browse/SPARK-37945
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Priority: Major
>
> Migrate the following errors in QueryExecutionErrors:
> * overflowInSumOfDecimalError
> * overflowInIntegralDivideError
> * arithmeticOverflowError
> * unaryMinusCauseOverflowError
> * binaryArithmeticCauseOverflowError
> * unscaledValueTooLargeForPrecisionError
> * decimalPrecisionExceedsMaxPrecisionError
> * outOfDecimalTypeRangeError
> * integerOverflowError
> onto use error classes. Throw an implementation of SparkThrowable. Also write 
> a test per every error in QueryExecutionErrorsSuite.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37945) Use error classes in the execution errors of arithmetic ops

2022-10-10 Thread Khalid Mammadov (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17615361#comment-17615361
 ] 

Khalid Mammadov commented on SPARK-37945:
-

[~maxgekk] I see you have already fixed most of these, I can pick up (and 
started already) below ones if Ok? 

unscaledValueTooLargeForPrecisionError
decimalPrecisionExceedsMaxPrecisionError
outOfDecimalTypeRangeError
integerOverflowError

Ps: Looks fearly streightforward and shouldn't take long

> Use error classes in the execution errors of arithmetic ops
> ---
>
> Key: SPARK-37945
> URL: https://issues.apache.org/jira/browse/SPARK-37945
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Priority: Major
>
> Migrate the following errors in QueryExecutionErrors:
> * overflowInSumOfDecimalError
> * overflowInIntegralDivideError
> * arithmeticOverflowError
> * unaryMinusCauseOverflowError
> * binaryArithmeticCauseOverflowError
> * unscaledValueTooLargeForPrecisionError
> * decimalPrecisionExceedsMaxPrecisionError
> * outOfDecimalTypeRangeError
> * integerOverflowError
> onto use error classes. Throw an implementation of SparkThrowable. Also write 
> a test per every error in QueryExecutionErrorsSuite.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38465) Use error classes in org.apache.spark.launcher

2022-10-01 Thread Khalid Mammadov (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17611896#comment-17611896
 ] 

Khalid Mammadov commented on SPARK-38465:
-

Hi [~bozhang] [~maxgekk], I would like to look into this if there are no 
objections?

> Use error classes in org.apache.spark.launcher
> --
>
> Key: SPARK-38465
> URL: https://issues.apache.org/jira/browse/SPARK-38465
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Bo Zhang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40620) Deduplication of WorkerOffer build in CoarseGrainedSchedulerBackend

2022-09-30 Thread Khalid Mammadov (Jira)

Khalid Mammadov created SPARK-40620:
---

 Summary: Deduplication of WorkerOffer build in 
CoarseGrainedSchedulerBackend
 Key: SPARK-40620
 URL: https://issues.apache.org/jira/browse/SPARK-40620
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Khalid Mammadov


WorkerOffer build in CoarseGrainedSchedulerBackend is repeated two different 
places with exact same parameters. We can deduplicate and increase readability 
by moving that to a private function 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40210) Fix math atan2, hypot, pow and pmod float argument call

2022-08-24 Thread Khalid Mammadov (Jira)

Khalid Mammadov created SPARK-40210:
---

 Summary: Fix math atan2, hypot, pow and pmod float argument call
 Key: SPARK-40210
 URL: https://issues.apache.org/jira/browse/SPARK-40210
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 3.4.0
Reporter: Khalid Mammadov


PySpark atan2, hypot, pow and pmod functions marked as accepting float type as 
argument but produce error when called together



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40009) Add missing doc string info to DataFrame API

2022-08-14 Thread Khalid Mammadov (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khalid Mammadov updated SPARK-40009:

Description: Some of the docstrings in Python DataFrame API is not 
complete, for example some missing Parameters section or Return or Examples. It 
would help users if we can provide these missing infos for all 
methods/functions  (was: Provide examples for DataFrame union and unionAll 
functions for PySpark. Also document parameters)

> Add missing doc string info to DataFrame API
> 
>
> Key: SPARK-40009
> URL: https://issues.apache.org/jira/browse/SPARK-40009
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.4.0
>Reporter: Khalid Mammadov
>Priority: Minor
>
> Some of the docstrings in Python DataFrame API is not complete, for example 
> some missing Parameters section or Return or Examples. It would help users if 
> we can provide these missing infos for all methods/functions



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40009) Add missing doc string info to DataFrame API

2022-08-14 Thread Khalid Mammadov (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khalid Mammadov updated SPARK-40009:

Summary: Add missing doc string info to DataFrame API  (was: Add doc string 
to DataFrame union and unionAll)

> Add missing doc string info to DataFrame API
> 
>
> Key: SPARK-40009
> URL: https://issues.apache.org/jira/browse/SPARK-40009
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.4.0
>Reporter: Khalid Mammadov
>Priority: Minor
>
> Provide examples for DataFrame union and unionAll functions for PySpark. Also 
> document parameters



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40009) Add doc string to DataFrame union and unionAll

2022-08-08 Thread Khalid Mammadov (Jira)

Khalid Mammadov created SPARK-40009:
---

 Summary: Add doc string to DataFrame union and unionAll
 Key: SPARK-40009
 URL: https://issues.apache.org/jira/browse/SPARK-40009
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 3.4.0
Reporter: Khalid Mammadov


Provide examples for DataFrame union and unionAll functions for PySpark. Also 
document parameters



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-39982) StructType.fromJson method missing documentation

2022-08-04 Thread Khalid Mammadov (Jira)

Khalid Mammadov created SPARK-39982:
---

 Summary: StructType.fromJson method missing documentation
 Key: SPARK-39982
 URL: https://issues.apache.org/jira/browse/SPARK-39982
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 3.3.0
Reporter: Khalid Mammadov


StructType.fromJson method does not have any documentation. It would be good to 
have one that explains how one can use it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-38261) Sync missing R packages with CI

2022-02-20 Thread Khalid Mammadov (Jira)

Khalid Mammadov created SPARK-38261:
---

 Summary: Sync missing R packages with CI
 Key: SPARK-38261
 URL: https://issues.apache.org/jira/browse/SPARK-38261
 Project: Spark
  Issue Type: Github Integration
  Components: Build
Affects Versions: 3.2.1
Reporter: Khalid Mammadov


Current GitHub workflow job *Linters, licenses, dependencies and documentation 
generation* is missing R packages to complete Documentation and API build.

*Build and test* -  is not failing as these packages are installed in the base 
image.

We need to keep them in-sync IMO with the base image for easy switch back to 
ubuntu runner when ready.

These R packages are missing: *markdown* and  *e1071*

Reference:

Base image - 
https://hub.docker.com/layers/dongjoon/apache-spark-github-action-image/20220207/images/sha256-af09d172ff8e2cbd71df9a1bc5384a47578c4a4cc293786c539333cafaf4a7ce?context=explore



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-38210) Spark documentation build README is stale

2022-02-14 Thread Khalid Mammadov (Jira)

Khalid Mammadov created SPARK-38210:
---

 Summary: Spark documentation build README is stale
 Key: SPARK-38210
 URL: https://issues.apache.org/jira/browse/SPARK-38210
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Affects Versions: 3.2.1
Reporter: Khalid Mammadov


I was following docs/README.md to build documentation and found out that it's 
not complete. I had to install additional packages that is not documented but 
available in the [CI/CD phase 
|https://github.com/apache/spark/blob/c8b34ab7340265f1f2bec2afa694c10f174b222c/.github/workflows/build_and_test.yml#L526]and
 few more to finish the build process.

I will file a PR to change README.md to include these packages and improve the 
guide



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-38120) HiveExternalCatalog.listPartitions is failing when partition column name is upper case and dot in partition value

2022-02-06 Thread Khalid Mammadov (Jira)

Khalid Mammadov created SPARK-38120:
---

 Summary: HiveExternalCatalog.listPartitions is failing when 
partition column name is upper case and dot in partition value
 Key: SPARK-38120
 URL: https://issues.apache.org/jira/browse/SPARK-38120
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.1
Reporter: Khalid Mammadov


HiveExternalCatalog.listPartitions method call is failing when a partition 
column name is upper case and partition value contains dot. It's related to 
this change 
[https://github.com/apache/spark/commit/f18b905f6cace7686ef169fda7de474079d0af23]

The test casein that PR does not produce the issue as partition column name is 
lower case.

 

Below how to reproduce the issue:

scala> import org.apache.spark.sql.catalyst.TableIdentifier
import org.apache.spark.sql.catalyst.TableIdentifier

scala> spark.sql("CREATE TABLE customer(id INT, name STRING) PARTITIONED BY 
(partCol1 STRING, partCol2 STRING)")
scala> spark.sql("INSERT INTO customer PARTITION (partCol1 = 'CA', partCol2 = 
'i.j') VALUES (100, 'John')")                               

scala> spark.sessionState.catalog.listPartitions(TableIdentifier("customer"), 
Some(Map("partCol2" -> "i.j"))).foreach(println)
java.util.NoSuchElementException: key not found: partcol2
  at scala.collection.immutable.Map$Map2.apply(Map.scala:227)
  at 
org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils$.$anonfun$isPartialPartitionSpec$1(ExternalCatalogUtils.scala:205)
  at 
org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils$.$anonfun$isPartialPartitionSpec$1$adapted(ExternalCatalogUtils.scala:202)
  at scala.collection.immutable.Map$Map1.forall(Map.scala:196)
  at 
org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils$.isPartialPartitionSpec(ExternalCatalogUtils.scala:202)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$listPartitions$6(HiveExternalCatalog.scala:1312)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$listPartitions$6$adapted(HiveExternalCatalog.scala:1312)
  at 
scala.collection.TraversableLike.$anonfun$filterImpl$1(TraversableLike.scala:304)
  at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
  at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
  at scala.collection.TraversableLike.filterImpl(TraversableLike.scala:303)
  at scala.collection.TraversableLike.filterImpl$(TraversableLike.scala:297)
  at scala.collection.AbstractTraversable.filterImpl(Traversable.scala:108)
  at scala.collection.TraversableLike.filter(TraversableLike.scala:395)
  at scala.collection.TraversableLike.filter$(TraversableLike.scala:395)
  at scala.collection.AbstractTraversable.filter(Traversable.scala:108)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$listPartitions$1(HiveExternalCatalog.scala:1312)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClientWrappingException(HiveExternalCatalog.scala:114)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:103)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog.listPartitions(HiveExternalCatalog.scala:1296)
  at 
org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.listPartitions(ExternalCatalogWithListener.scala:254)
  at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.listPartitions(SessionCatalog.scala:1251)
  ... 47 elided



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37996) Contribution guide is stale

2022-02-02 Thread Khalid Mammadov (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17486139#comment-17486139
 ] 

Khalid Mammadov commented on SPARK-37996:
-

Raised PR: https://github.com/apache/spark-website/pull/378

With following changes:
 * It describes in the Pull request section of the Contributing page the actual 
procedure and takes a contributor through a step by step process.
 * It removes optional "Running tests in your forked repository" section on 
Developer Tools page which is obsolete now and doesn't reflect reality anymore 
i.e. it says we can test by clicking “Run workflow” button which is not 
available anymore as workflow does not use "workflow_dispatch" event trigger 
anymore and was removed in
 * [[SPARK-35048][INFRA] Distribute GitHub Actions workflows to fork 
repositories to share the resources 
spark#32092|https://github.com/apache/spark/pull/32092]
 * Instead it documents the new procedure that above PR introduced i.e. 
contributors needs to use their own GitHub free workflow credits to test new 
changes they are purposing and a Spark Actions workflow will expect that to be 
completed before marking PR to be ready for a review.
 * Some general wording was copied from "Running tests in your forked 
repository" section on Developer Tools page but main content was rewritten to 
meet objective
 * Also fixed URL to developer-tools.html to be resolved by parser (that 
converted it into relative URI) instead of using hard coded absolute URL.

> Contribution guide is stale
> ---
>
> Key: SPARK-37996
> URL: https://issues.apache.org/jira/browse/SPARK-37996
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.2.0
>Reporter: Khalid Mammadov
>Priority: Minor
>
> Contribution guide mentions below link to use to test on local repo before 
> raising PR but the process has changed and documentation does not reflect it.
> https://spark.apache.org/developer-tools.html#github-workflow-tests
> Only digging into git log of " 
> [.github/workflows/build_and_test.yml|https://github.com/apache/spark/commit/2974b70d1efd4b1c5cfe7e2467766f0a9a1fec82#diff-48c0ee97c53013d18d6bbae44648f7fab9af2e0bf5b0dc1ca761e18ec5c478f2]";
>  I managed to find what the new process is. It was changed in 
> [https://github.com/apache/spark/pull/32092] but documentation was not 
> updated.
> I am happy to contribute to fix it but apparently 
> [https://spark.apache.org/developer-tools.html] is hosted in Apache website 
> rather that in the Spark source code



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37996) Contribution guide is stale

2022-01-27 Thread Khalid Mammadov (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17482946#comment-17482946
 ] 

Khalid Mammadov commented on SPARK-37996:
-

Sure, working on it. Thanks for the repo link!

> Contribution guide is stale
> ---
>
> Key: SPARK-37996
> URL: https://issues.apache.org/jira/browse/SPARK-37996
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.2.0
>Reporter: Khalid Mammadov
>Priority: Minor
>
> Contribution guide mentions below link to use to test on local repo before 
> raising PR but the process has changed and documentation does not reflect it.
> https://spark.apache.org/developer-tools.html#github-workflow-tests
> Only digging into git log of " 
> [.github/workflows/build_and_test.yml|https://github.com/apache/spark/commit/2974b70d1efd4b1c5cfe7e2467766f0a9a1fec82#diff-48c0ee97c53013d18d6bbae44648f7fab9af2e0bf5b0dc1ca761e18ec5c478f2]";
>  I managed to find what the new process is. It was changed in 
> [https://github.com/apache/spark/pull/32092] but documentation was not 
> updated.
> I am happy to contribute to fix it but apparently 
> [https://spark.apache.org/developer-tools.html] is hosted in Apache website 
> rather that in the Spark source code



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-38025) Improve test suite ExternalCatalogSuite

2022-01-25 Thread Khalid Mammadov (Jira)

Khalid Mammadov created SPARK-38025:
---

 Summary: Improve test suite ExternalCatalogSuite
 Key: SPARK-38025
 URL: https://issues.apache.org/jira/browse/SPARK-38025
 Project: Spark
  Issue Type: Improvement
  Components: Tests
Affects Versions: 3.3
Reporter: Khalid Mammadov


Test suite *ExternalCatalogSuite.scala* can be optimized by removing repetitive 
code by replacing them with already available utility function with some minor 
changes. This will reduce redundant code, simplify the suite and improve 
readability.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37996) Contribution guide is stale

2022-01-24 Thread Khalid Mammadov (Jira)

Khalid Mammadov created SPARK-37996:
---

 Summary: Contribution guide is stale
 Key: SPARK-37996
 URL: https://issues.apache.org/jira/browse/SPARK-37996
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 3.2.0
Reporter: Khalid Mammadov


Contribution guide mentions below link to use to test on local repo before 
raising PR but the process has changed and documentation does not reflect it.

https://spark.apache.org/developer-tools.html#github-workflow-tests

Only digging into git log of " 
[.github/workflows/build_and_test.yml|https://github.com/apache/spark/commit/2974b70d1efd4b1c5cfe7e2467766f0a9a1fec82#diff-48c0ee97c53013d18d6bbae44648f7fab9af2e0bf5b0dc1ca761e18ec5c478f2]";
 I managed to find what the new process is. It was changed in 
[https://github.com/apache/spark/pull/32092] but documentation was not updated.

I am happy to contribute to fix it but apparently 
[https://spark.apache.org/developer-tools.html] is hosted in Apache website 
rather that in the Spark source code



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37991) Improve test case inside SQL\Catalyst\Catalog

2022-01-23 Thread Khalid Mammadov (Jira)

Khalid Mammadov created SPARK-37991:
---

 Summary: Improve test case inside SQL\Catalyst\Catalog
 Key: SPARK-37991
 URL: https://issues.apache.org/jira/browse/SPARK-37991
 Project: Spark
  Issue Type: Improvement
  Components: Tests
Affects Versions: 3.2.0
Reporter: Khalid Mammadov


Test case "{*}basic create and list partitions{*}" inside 
*ExternalCatalogSuite.scala* can be simplified by replacing set up code with a 
test utility function from the same suite: *newBasicCatalog()*



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45782) Add Dataframe API df.explainString()

[jira] [Created] (SPARK-45716) Python parity method StructType.treeString

[jira] [Created] (SPARK-43243) Add Level param to df.printSchema for Python API

[jira] [Updated] (SPARK-42437) Pyspark catalog.cacheTable allow to specify storage level Connect add support Storagelevel

[jira] [Created] (SPARK-42437) Pyspark catalog.cacheTable allow to specify storage level

[jira] [Created] (SPARK-42400) Code clean up in org.apache.spark.storage

[jira] [Created] (SPARK-42257) Remove unused variable in ExternalSorter

[jira] [Commented] (SPARK-37946) Use error classes in the execution errors related to partitions

[jira] [Comment Edited] (SPARK-37945) Use error classes in the execution errors of arithmetic ops

[jira] [Commented] (SPARK-37945) Use error classes in the execution errors of arithmetic ops

[jira] [Commented] (SPARK-37945) Use error classes in the execution errors of arithmetic ops

[jira] [Commented] (SPARK-38465) Use error classes in org.apache.spark.launcher

[jira] [Created] (SPARK-40620) Deduplication of WorkerOffer build in CoarseGrainedSchedulerBackend

[jira] [Created] (SPARK-40210) Fix math atan2, hypot, pow and pmod float argument call

[jira] [Updated] (SPARK-40009) Add missing doc string info to DataFrame API

[jira] [Updated] (SPARK-40009) Add missing doc string info to DataFrame API

[jira] [Created] (SPARK-40009) Add doc string to DataFrame union and unionAll

[jira] [Created] (SPARK-39982) StructType.fromJson method missing documentation

[jira] [Created] (SPARK-38261) Sync missing R packages with CI

[jira] [Created] (SPARK-38210) Spark documentation build README is stale

[jira] [Created] (SPARK-38120) HiveExternalCatalog.listPartitions is failing when partition column name is upper case and dot in partition value

[jira] [Commented] (SPARK-37996) Contribution guide is stale

[jira] [Commented] (SPARK-37996) Contribution guide is stale

[jira] [Created] (SPARK-38025) Improve test suite ExternalCatalogSuite

[jira] [Created] (SPARK-37996) Contribution guide is stale

[jira] [Created] (SPARK-37991) Improve test case inside SQL\Catalyst\Catalog

26 matches

Site Navigation

Mail list logo

Footer information