[jira] [Commented] (SPARK-47042) Fix `spark-common-utils` module to have explicit `commons-lang3` dependency

2024-02-14 Thread William Wong (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817483#comment-17817483
 ] 

William Wong commented on SPARK-47042:
--

The ticket was created via cloning. Somehow, the cloned ticket keeps the 
assignee and I cannot update the assignee back to myself. Sorry for the 
confusion caused. 

> Fix `spark-common-utils` module to have explicit `commons-lang3` dependency
> ---
>
> Key: SPARK-47042
> URL: https://issues.apache.org/jira/browse/SPARK-47042
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Tests
>Affects Versions: 4.0.0
>Reporter: William Wong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Following scala code depends on `commons-lang3` explicitly.  However, the 
> common-utils modules missing related dependency.
> {code:java}
> ~/dev/sources/spark/common/utils$ grep -R lang3 * | grep import
> src/main/scala/org/apache/spark/util/MavenUtils.scala:import 
> org.apache.commons.lang3.StringUtils
> src/main/scala/org/apache/spark/util/ClosureCleaner.scala:import 
> org.apache.commons.lang3.ClassUtils
> src/main/java/org/apache/spark/network/util/JavaUtils.java:import 
> org.apache.commons.lang3.SystemUtils; {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-47043) Fix `spark-common-utils` module to have explicit `jackson-core` and `jackson-annotations`dependency

2024-02-14 Thread William Wong (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817482#comment-17817482
 ] 

William Wong commented on SPARK-47043:
--

The ticket was created via cloning. Somehow, the cloned ticket keeps the 
assignee and I cannot update the assignee back to myself. Sorry for the 
confusion caused. 

> Fix `spark-common-utils` module to have explicit `jackson-core` and 
> `jackson-annotations`dependency
> ---
>
> Key: SPARK-47043
> URL: https://issues.apache.org/jira/browse/SPARK-47043
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Tests
>Affects Versions: 4.0.0
>Reporter: William Wong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Following scala code depends on `jackson-core` and `jackson-annotations` 
> explicitly.  However, spark-common-utils modules missing related dependency.
> {code:java}
> ~/dev/sources/spark$ grep -R jackson.core ./common/utils/* | grep import
> ./common/utils/src/main/scala/org/apache/spark/ErrorClassesJSONReader.scala:import
>  com.fasterxml.jackson.core.`type`.TypeReference
> ./common/utils/src/main/scala/org/apache/spark/util/JsonUtils.scala:import 
> com.fasterxml.jackson.core.{JsonEncoding, JsonGenerator}
> ~/dev/sources/spark$ grep -R jackson.annotation ./common/utils/* | grep import
> ./common/utils/src/main/scala/org/apache/spark/ErrorClassesJSONReader.scala:import
>  com.fasterxml.jackson.annotation.JsonIgnore
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47043) Fix `spark-common-utils` module to have explicit `jackson-core` and `jackson-annotations`dependency

2024-02-14 Thread William Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

William Wong updated SPARK-47043:
-
Labels:   (was: pull-request-available)

> Fix `spark-common-utils` module to have explicit `jackson-core` and 
> `jackson-annotations`dependency
> ---
>
> Key: SPARK-47043
> URL: https://issues.apache.org/jira/browse/SPARK-47043
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Tests
>Affects Versions: 4.0.0
>Reporter: William Wong
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 4.0.0
>
>
> Following scala code depends on `jackson-core` and `jackson-annotations` 
> explicitly.  However, spark-common-utils modules missing related dependency.
> {code:java}
> ~/dev/sources/spark$ grep -R jackson.core ./common/utils/* | grep import
> ./common/utils/src/main/scala/org/apache/spark/ErrorClassesJSONReader.scala:import
>  com.fasterxml.jackson.core.`type`.TypeReference
> ./common/utils/src/main/scala/org/apache/spark/util/JsonUtils.scala:import 
> com.fasterxml.jackson.core.{JsonEncoding, JsonGenerator}
> ~/dev/sources/spark$ grep -R jackson.annotation ./common/utils/* | grep import
> ./common/utils/src/main/scala/org/apache/spark/ErrorClassesJSONReader.scala:import
>  com.fasterxml.jackson.annotation.JsonIgnore
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47043) Fix `spark-common-utils` module to have explicit `jackson-core` and `jackson-annotations`dependency

2024-02-14 Thread William Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

William Wong updated SPARK-47043:
-
Description: 
Following scala code depends on `jackson-core` and `jackson-annotations` 
explicitly.  However, spark-common-utils modules missing related dependency.
{code:java}
~/dev/sources/spark$ grep -R jackson.core ./common/utils/* | grep import
./common/utils/src/main/scala/org/apache/spark/ErrorClassesJSONReader.scala:import
 com.fasterxml.jackson.core.`type`.TypeReference
./common/utils/src/main/scala/org/apache/spark/util/JsonUtils.scala:import 
com.fasterxml.jackson.core.{JsonEncoding, JsonGenerator}
~/dev/sources/spark$ grep -R jackson.annotation ./common/utils/* | grep import
./common/utils/src/main/scala/org/apache/spark/ErrorClassesJSONReader.scala:import
 com.fasterxml.jackson.annotation.JsonIgnore
{code}

  was:
Following scala code depends on `jackson-core` and `jackson-annotations` 
explicitly.  However, spark-common-utils modules missing related dependency.
{code:java}
~/dev/sources/spark$ grep -R jackson.core ./common/utils/* | grep import
./common/utils/src/main/scala/org/apache/spark/ErrorClassesJSONReader.scalaimport
 com.fasterxml.jackson.core.`type`.TypeReference
./common/utils/src/main/scala/org/apache/spark/util/JsonUtils.scala:import 
com.fasterxml.jackson.core.{JsonEncoding, JsonGenerator}
~/dev/sources/spark$ grep -R jackson.annotation ./common/utils/* | grep import
./common/utils/src/main/scala/org/apache/spark/ErrorClassesJSONReader.scala:import
 com.fasterxml.jackson.annotation.JsonIgnore
{code}


> Fix `spark-common-utils` module to have explicit `jackson-core` and 
> `jackson-annotations`dependency
> ---
>
> Key: SPARK-47043
> URL: https://issues.apache.org/jira/browse/SPARK-47043
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Tests
>Affects Versions: 4.0.0
>Reporter: William Wong
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Following scala code depends on `jackson-core` and `jackson-annotations` 
> explicitly.  However, spark-common-utils modules missing related dependency.
> {code:java}
> ~/dev/sources/spark$ grep -R jackson.core ./common/utils/* | grep import
> ./common/utils/src/main/scala/org/apache/spark/ErrorClassesJSONReader.scala:import
>  com.fasterxml.jackson.core.`type`.TypeReference
> ./common/utils/src/main/scala/org/apache/spark/util/JsonUtils.scala:import 
> com.fasterxml.jackson.core.{JsonEncoding, JsonGenerator}
> ~/dev/sources/spark$ grep -R jackson.annotation ./common/utils/* | grep import
> ./common/utils/src/main/scala/org/apache/spark/ErrorClassesJSONReader.scala:import
>  com.fasterxml.jackson.annotation.JsonIgnore
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47043) Fix `spark-common-utils` module to have explicit `jackson-core` and `jackson-annotations`dependency

2024-02-14 Thread William Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

William Wong updated SPARK-47043:
-
Description: 
Following scala code depends on `jackson-core` and `jackson-annotations` 
explicitly.  However, spark-common-utils modules missing related dependency.
{code:java}
~/dev/sources/spark$ grep -R jackson.core ./common/utils/* | grep import
./common/utils/src/main/scala/org/apache/spark/ErrorClassesJSONReader.scalaimport
 com.fasterxml.jackson.core.`type`.TypeReference
./common/utils/src/main/scala/org/apache/spark/util/JsonUtils.scala:import 
com.fasterxml.jackson.core.{JsonEncoding, JsonGenerator}
~/dev/sources/spark$ grep -R jackson.annotation ./common/utils/* | grep import
./common/utils/src/main/scala/org/apache/spark/ErrorClassesJSONReader.scala:import
 com.fasterxml.jackson.annotation.JsonIgnore
{code}

  was:
Following scala code depends on `commons-lang3` explicitly.  However, the 
common-utils modules missing related dependency.
{code:java}
~/dev/sources/spark/common/utils$ grep -R lang3 * | grep import
src/main/scala/org/apache/spark/util/MavenUtils.scala:import 
org.apache.commons.lang3.StringUtils
src/main/scala/org/apache/spark/util/ClosureCleaner.scala:import 
org.apache.commons.lang3.ClassUtils
src/main/java/org/apache/spark/network/util/JavaUtils.java:import 
org.apache.commons.lang3.SystemUtils; {code}


> Fix `spark-common-utils` module to have explicit `jackson-core` and 
> `jackson-annotations`dependency
> ---
>
> Key: SPARK-47043
> URL: https://issues.apache.org/jira/browse/SPARK-47043
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Tests
>Affects Versions: 4.0.0
>Reporter: William Wong
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Following scala code depends on `jackson-core` and `jackson-annotations` 
> explicitly.  However, spark-common-utils modules missing related dependency.
> {code:java}
> ~/dev/sources/spark$ grep -R jackson.core ./common/utils/* | grep import
> ./common/utils/src/main/scala/org/apache/spark/ErrorClassesJSONReader.scalaimport
>  com.fasterxml.jackson.core.`type`.TypeReference
> ./common/utils/src/main/scala/org/apache/spark/util/JsonUtils.scala:import 
> com.fasterxml.jackson.core.{JsonEncoding, JsonGenerator}
> ~/dev/sources/spark$ grep -R jackson.annotation ./common/utils/* | grep import
> ./common/utils/src/main/scala/org/apache/spark/ErrorClassesJSONReader.scala:import
>  com.fasterxml.jackson.annotation.JsonIgnore
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47042) Fix `spark-common-utils` module to have explicit `commons-lang3` dependency

2024-02-14 Thread William Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

William Wong updated SPARK-47042:
-
Summary: Fix `spark-common-utils` module to have explicit `commons-lang3` 
dependency  (was: Fix `common-utils` module to have explicit `commons-lang3` 
dependency)

> Fix `spark-common-utils` module to have explicit `commons-lang3` dependency
> ---
>
> Key: SPARK-47042
> URL: https://issues.apache.org/jira/browse/SPARK-47042
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Tests
>Affects Versions: 4.0.0
>Reporter: William Wong
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Following scala code depends on `commons-lang3` explicitly.  However, the 
> common-utils modules missing related dependency.
> {code:java}
> ~/dev/sources/spark/common/utils$ grep -R lang3 * | grep import
> src/main/scala/org/apache/spark/util/MavenUtils.scala:import 
> org.apache.commons.lang3.StringUtils
> src/main/scala/org/apache/spark/util/ClosureCleaner.scala:import 
> org.apache.commons.lang3.ClassUtils
> src/main/java/org/apache/spark/network/util/JavaUtils.java:import 
> org.apache.commons.lang3.SystemUtils; {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47043) Fix `spark-common-utils` module to have explicit `jackson-core` and `jackson-annotations`dependency

2024-02-14 Thread William Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

William Wong updated SPARK-47043:
-
Summary: Fix `spark-common-utils` module to have explicit `jackson-core` 
and `jackson-annotations`dependency  (was: Fix `common-utils` module to have 
explicit `jackson-core` and `jackson-annotations`dependency)

> Fix `spark-common-utils` module to have explicit `jackson-core` and 
> `jackson-annotations`dependency
> ---
>
> Key: SPARK-47043
> URL: https://issues.apache.org/jira/browse/SPARK-47043
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Tests
>Affects Versions: 4.0.0
>Reporter: William Wong
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Following scala code depends on `commons-lang3` explicitly.  However, the 
> common-utils modules missing related dependency.
> {code:java}
> ~/dev/sources/spark/common/utils$ grep -R lang3 * | grep import
> src/main/scala/org/apache/spark/util/MavenUtils.scala:import 
> org.apache.commons.lang3.StringUtils
> src/main/scala/org/apache/spark/util/ClosureCleaner.scala:import 
> org.apache.commons.lang3.ClassUtils
> src/main/java/org/apache/spark/network/util/JavaUtils.java:import 
> org.apache.commons.lang3.SystemUtils; {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47043) Fix `common-utils` module to have explicit `jackson-core` and `jackson-annotations`dependency

2024-02-14 Thread William Wong (Jira)
William Wong created SPARK-47043:


 Summary: Fix `common-utils` module to have explicit `jackson-core` 
and `jackson-annotations`dependency
 Key: SPARK-47043
 URL: https://issues.apache.org/jira/browse/SPARK-47043
 Project: Spark
  Issue Type: Sub-task
  Components: Build, Tests
Affects Versions: 4.0.0
Reporter: William Wong
Assignee: Dongjoon Hyun
 Fix For: 4.0.0


Following scala code depends on `commons-lang3` explicitly.  However, the 
common-utils modules missing related dependency.
{code:java}
~/dev/sources/spark/common/utils$ grep -R lang3 * | grep import
src/main/scala/org/apache/spark/util/MavenUtils.scala:import 
org.apache.commons.lang3.StringUtils
src/main/scala/org/apache/spark/util/ClosureCleaner.scala:import 
org.apache.commons.lang3.ClassUtils
src/main/java/org/apache/spark/network/util/JavaUtils.java:import 
org.apache.commons.lang3.SystemUtils; {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-47042) Fix `common-utils` module to have explicit `commons-lang3` dependency

2024-02-14 Thread William Wong (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817403#comment-17817403
 ] 

William Wong commented on SPARK-47042:
--

Apart from missing commons-lang3 dependencies, 'commons-utils' module also 
missing explicit dependencies for 'jackson-core' and 'jackson-annotations'. 

Should I also fix it with this JIRA, or create another Jira for fixing them? 
{code:java}
~/dev/sources/spark$ ./build/mvn -T 2C -pl ./common/utils/pom.xml 
dependency:analyze
Using `mvn` from path: 
/home/william/dev/sources/spark/build/apache-maven-3.9.6/bin/mvn
[INFO] Scanning for projects...
[INFO]
[INFO] Using the MultiThreadedBuilder implementation with a thread count of 24
[INFO]
[INFO] --< org.apache.spark:spark-common-utils_2.13 >--
[INFO] Building Spark Project Common Utils 4.0.0-SNAPSHOT
[INFO]   from pom.xml
[INFO] [ jar ]-
[INFO]
[INFO] >>> dependency:3.6.0:analyze (default-cli) > test-compile @ 
spark-common-utils_2.13 >>>
[INFO]
[INFO] --- enforcer:3.3.0:enforce (enforce-versions) @ spark-common-utils_2.13 
---
[INFO] Rule 0: org.apache.maven.enforcer.rules.version.RequireMavenVersion 
passed
[INFO] Rule 1: org.apache.maven.enforcer.rules.version.RequireJavaVersion passed
[INFO] Rule 2: org.apache.maven.enforcer.rules.dependency.BannedDependencies 
passed
[INFO] Rule 3: 
org.codehaus.mojo.extraenforcer.dependencies.EnforceBytecodeVersion passed
[INFO]
[INFO] --- enforcer:3.3.0:enforce (enforce-no-duplicate-dependencies) @ 
spark-common-utils_2.13 ---
[INFO] Rule 0: 
org.apache.maven.enforcer.rules.BanDuplicatePomDependencyVersions passed
[INFO]
[INFO] --- antrun:3.1.0:run (choose-shell-and-script) @ spark-common-utils_2.13 
---
[INFO] Executing tasks
[WARNING]      [echo] Shell to use for generating spark-version-info.properties 
file =
[WARNING]      [echo]                   bash
[WARNING]      [echo] Script to use for generating 
spark-version-info.properties file =
[WARNING]      [echo]                   spark-build-info
[INFO] Executed tasks
[INFO]
[INFO] --- scalafmt:1.1.1640084764.9f463a9:format (default) @ 
spark-common-utils_2.13 ---
[WARNING] format.skipSources set, ignoring main directories
[WARNING] format.skipTestSources set, ignoring validateOnly directories
[WARNING] No sources specified, skipping formatting
[INFO]
[INFO] --- scala:4.7.1:add-source (eclipse-add-source) @ 
spark-common-utils_2.13 ---
[INFO] Add Source directory: 
/home/william/dev/sources/spark/common/utils/src/main/scala
[INFO] Add Test Source directory: 
/home/william/dev/sources/spark/common/utils/src/test/scala
[INFO]
[INFO] --- dependency:3.6.0:build-classpath (default-cli) @ 
spark-common-utils_2.13 ---
[INFO] Dependencies classpath:

.


[INFO]
[INFO] <<< dependency:3.6.0:analyze (default-cli) < test-compile @ 
spark-common-utils_2.13 <<<
[INFO]
[INFO]
[INFO] --- dependency:3.6.0:analyze (default-cli) @ spark-common-utils_2.13 ---
[WARNING] Used undeclared dependencies found:
[WARNING]    com.fasterxml.jackson.core:jackson-annotations:jar:2.16.1:compile
[WARNING]    org.apache.commons:commons-lang3:jar:3.14.0:compile
[WARNING]    com.fasterxml.jackson.core:jackson-core:jar:2.16.1:compile
[WARNING]    org.scala-lang:scala-library:jar:2.13.12:compile
[WARNING]    org.scalatest:scalatest-funsuite_2.13:jar:3.2.17:test
[WARNING]    org.scalactic:scalactic_2.13:jar:3.2.17:test
[WARNING]    org.scalatest:scalatest-compatible:jar:3.2.17:test
[WARNING]    org.scalatest:scalatest-core_2.13:jar:3.2.17:test
[WARNING] Unused declared dependencies found:
[WARNING]    
com.fasterxml.jackson.module:jackson-module-scala_2.13:jar:2.16.1:compile
[WARNING]    oro:oro:jar:2.0.8:compile
[WARNING]    org.slf4j:jul-to-slf4j:jar:2.0.11:compile
[WARNING]    org.slf4j:jcl-over-slf4j:jar:2.0.11:compile
[WARNING]    org.apache.logging.log4j:log4j-slf4j2-impl:jar:2.22.1:compile
[WARNING]    org.apache.logging.log4j:log4j-1.2-api:jar:2.22.1:compile
[WARNING]    org.spark-project.spark:unused:jar:1.0.0:compile
[WARNING]    org.scalatest:scalatest_2.13:jar:3.2.17:test
[WARNING]    org.scalatestplus:scalacheck-1-17_2.13:jar:3.2.17.0:test
[WARNING]    org.scalatestplus:mockito-4-11_2.13:jar:3.2.17.0:test
[WARNING]    org.scalatestplus:selenium-4-12_2.13:jar:3.2.17.0:test
[WARNING]    org.junit.jupiter:junit-jupiter:jar:5.9.3:test
[WARNING]    net.aichler:jupiter-interface:jar:0.11.1:test
[WARNING] Non-test scoped test only dependencies found:
[WARNING]    commons-io:commons-io:jar:2.15.1:compile
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time:  8.474 s (Wall Clock)
[INFO] Finished at: 2024-02-14T22:44:45+08:00
[INFO] 
{code}

> Fix 

[jira] [Updated] (SPARK-47042) Fix `common-utils` module to have explicit `commons-lang3` dependency

2024-02-14 Thread William Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

William Wong updated SPARK-47042:
-
Description: 
Following scala code depends on `commons-lang3` explicitly.  However, the 
common-utils modules missing related dependency.
{code:java}
~/dev/sources/spark/common/utils$ grep -R lang3 * | grep import
src/main/scala/org/apache/spark/util/MavenUtils.scala:import 
org.apache.commons.lang3.StringUtils
src/main/scala/org/apache/spark/util/ClosureCleaner.scala:import 
org.apache.commons.lang3.ClassUtils
src/main/java/org/apache/spark/network/util/JavaUtils.java:import 
org.apache.commons.lang3.SystemUtils; {code}

> Fix `common-utils` module to have explicit `commons-lang3` dependency
> -
>
> Key: SPARK-47042
> URL: https://issues.apache.org/jira/browse/SPARK-47042
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Tests
>Affects Versions: 4.0.0
>Reporter: William Wong
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 4.0.0
>
>
> Following scala code depends on `commons-lang3` explicitly.  However, the 
> common-utils modules missing related dependency.
> {code:java}
> ~/dev/sources/spark/common/utils$ grep -R lang3 * | grep import
> src/main/scala/org/apache/spark/util/MavenUtils.scala:import 
> org.apache.commons.lang3.StringUtils
> src/main/scala/org/apache/spark/util/ClosureCleaner.scala:import 
> org.apache.commons.lang3.ClassUtils
> src/main/java/org/apache/spark/network/util/JavaUtils.java:import 
> org.apache.commons.lang3.SystemUtils; {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47042) Fix `common-utils` module to have explicit `commons-lang3` dependency

2024-02-14 Thread William Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

William Wong updated SPARK-47042:
-
Affects Version/s: 4.0.0
   (was: 3.3.0)
   (was: 3.5.0)
   (was: 3.4.2)
   (was: 3.3.4)

> Fix `common-utils` module to have explicit `commons-lang3` dependency
> -
>
> Key: SPARK-47042
> URL: https://issues.apache.org/jira/browse/SPARK-47042
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Tests
>Affects Versions: 4.0.0
>Reporter: William Wong
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.5.1, 3.4.3
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47042) Fix `common-utils` module to have explicit `commons-lang3` dependency

2024-02-14 Thread William Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

William Wong updated SPARK-47042:
-
Labels:   (was: pull-request-available)

> Fix `common-utils` module to have explicit `commons-lang3` dependency
> -
>
> Key: SPARK-47042
> URL: https://issues.apache.org/jira/browse/SPARK-47042
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Tests
>Affects Versions: 4.0.0
>Reporter: William Wong
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 4.0.0, 3.5.1, 3.4.3
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47042) Fix `common-utils` module to have explicit `commons-lang3` dependency

2024-02-14 Thread William Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

William Wong updated SPARK-47042:
-
Fix Version/s: (was: 3.5.1)
   (was: 3.4.3)

> Fix `common-utils` module to have explicit `commons-lang3` dependency
> -
>
> Key: SPARK-47042
> URL: https://issues.apache.org/jira/browse/SPARK-47042
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Tests
>Affects Versions: 4.0.0
>Reporter: William Wong
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47042) Fix `common-utils` module to have explicit `commons-lang3` dependency

2024-02-14 Thread William Wong (Jira)
William Wong created SPARK-47042:


 Summary: Fix `common-utils` module to have explicit 
`commons-lang3` dependency
 Key: SPARK-47042
 URL: https://issues.apache.org/jira/browse/SPARK-47042
 Project: Spark
  Issue Type: Sub-task
  Components: Build, Tests
Affects Versions: 3.3.0, 3.4.2, 3.5.0, 3.3.4
Reporter: William Wong
Assignee: Dongjoon Hyun
 Fix For: 4.0.0, 3.5.1, 3.4.3






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44319) Migrate jersey 2 to jersey 3

2024-02-03 Thread William Wong (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17814040#comment-17814040
 ] 

William Wong commented on SPARK-44319:
--

SPARK-46938 upgrades the jetty from version 10 to version 11. Maybe we should 
mark SPARK-46938 as a dependency of this Jira as well. 

> Migrate jersey 2 to jersey 3
> 
>
> Key: SPARK-44319
> URL: https://issues.apache.org/jira/browse/SPARK-44319
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-46122) Disable spark.sql.legacy.createHiveTableByDefault by default

2024-02-03 Thread William Wong (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17814036#comment-17814036
 ] 

William Wong commented on SPARK-46122:
--

Hi [~yumwang] ,

I can help on this JIRA. However, before making any changes to the existing 
Spark. May I know if there is any discussion of this proposed changes already? 
Why would we like to change the default behavior? 

Thanks and regards, 
William

> Disable spark.sql.legacy.createHiveTableByDefault by default
> 
>
> Key: SPARK-46122
> URL: https://issues.apache.org/jira/browse/SPARK-46122
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28103) Cannot infer filters from union table with empty local relation table properly

2019-06-18 Thread William Wong (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

William Wong updated SPARK-28103:
-
Description: 
Basically, the constraints of a union table could be turned empty if any 
subtable is turned into an empty local relation. The side effect is filter 
cannot be inferred correctly (by InferFiltersFromConstrains) 

 
 We may reproduce the issue with the following setup:

1) Prepare two tables: 

 
{code:java}
spark.sql("CREATE TABLE IF NOT EXISTS table1(id string, val string) USING 
PARQUET");
spark.sql("CREATE TABLE IF NOT EXISTS table2(id string, val string) USING 
PARQUET");{code}
 

2) Create a union view on table1. 
{code:java}
spark.sql("""
     | CREATE VIEW partitioned_table_1 AS
     | SELECT * FROM table1 WHERE id = 'a'
     | UNION ALL
     | SELECT * FROM table1 WHERE id = 'b'
     | UNION ALL
     | SELECT * FROM table1 WHERE id = 'c'
     | UNION ALL
     | SELECT * FROM table1 WHERE id NOT IN ('a','b','c')
     | """.stripMargin){code}
 

 3) View the optimized plan of this SQL. The filter '[t2.id = 'a']' cannot be 
inferred. We can see that the constraints of the left table are empty. 
{code:java}
scala> spark.sql("SELECT * FROM partitioned_table_1 t1, table2 t2 WHERE t1.id 
[t1.id] = t2.id [t2.id] AND t1.id [t1.id] = 'a'").queryExecution.optimizedPlan

res39: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan =
Join Inner, (id#0 = id#4)
:- Union
:  :- Filter (isnotnull(id#0) && (id#0 = a))
:  :  +- Relation[id#0,val#1] parquet
:  :- LocalRelation , [id#0, val#1]
:  :- LocalRelation , [id#0, val#1]
:  +- Filter ((isnotnull(id#0) && NOT id#0 IN (a,b,c)) && (id#0 = a))
:     +- Relation[id#0,val#1] parquet
+- Filter isnotnull(id#4)
   +- Relation[id#4,val#5] parquet

scala> spark.sql("SELECT * FROM partitioned_table_1 t1, table2 t2 WHERE t1.id 
[t1.id] = t2.id [t2.id] AND t1.id [t1.id] = 
'a'").queryExecution.optimizedPlan.children(0).constraints
res40: org.apache.spark.sql.catalyst.expressions.ExpressionSet = Set()
 
{code}
 

4) Modified the query to avoid empty local relation. The filter '[td.id in 
('a','b','c','d')' is then inferred properly. The constraints of the left table 
are not empty as well. 
{code:java}
scala> spark.sql("SELECT * FROM partitioned_table_1 t1, table2 t2 WHERE t1.id 
[t1.id] = t2.id [t2.id] AND t1.id [t1.id] IN 
('a','b','c','d')").queryExecution.optimizedPlan
res42: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan =
Join Inner, (id#0 = id#4)
:- Union
:  :- Filter ((isnotnull(id#0) && (id#0 = a)) && id#0 IN (a,b,c,d))
:  :  +- Relation[id#0,val#1] parquet
:  :- Filter ((isnotnull(id#0) && (id#0 = b)) && id#0 IN (a,b,c,d))
:  :  +- Relation[id#0,val#1] parquet
:  :- Filter ((isnotnull(id#0) && (id#0 = c)) && id#0 IN (a,b,c,d))
:  :  +- Relation[id#0,val#1] parquet
:  +- Filter ((NOT id#0 IN (a,b,c) && id#0 IN (a,b,c,d)) && isnotnull(id#0))
:     +- Relation[id#0,val#1] parquet
+- Filter ((id#4 IN (a,b,c,d) && ((isnotnull(id#4) && (((id#4 = a) || (id#4 = 
b)) || (id#4 = c))) || NOT id#4 IN (a,b,c))) && isnotnull(id#4))
   +- Relation[id#4,val#5] parquet
 
scala> spark.sql("SELECT * FROM partitioned_table_1 t1, table2 t2 WHERE t1.id 
[t1.id] = t2.id [t2.id] AND t1.id [t1.id] IN 
('a','b','c','d')").queryExecution.optimizedPlan.children(0).constraints
res44: org.apache.spark.sql.catalyst.expressions.ExpressionSet = 
Set(isnotnull(id#0), id#0 IN (a,b,c,d), id#0 = a) || (id#0 = b)) || (id#0 = 
c)) || NOT id#0 IN (a,b,c)))
{code}
 

One of the possible workaround is create a rule to remove all empty local 
relation from a union table. Or, when we convert a relation to into an empty 
local relation, we should preserve those constraints in the empty local 
relation as well. 

 

A side node. Expression in optimized plan is not well optimized. For example, 
the expression 
{code:java}
((id#4 IN (a,b,c,d) && ((isnotnull(id#4) && (((id#4 = a) || (id#4 = b)) || 
(id#4 = c))) || NOT id#4 IN (a,b,c))) && isnotnull(id#4)){code}
could be further optimized into 
{code:java}
(isnotnull(id#4) && (id = d)){code}
We may implement following rules to simplify the expression. 

1) convert all 'equal' operators into 'in' operator, and then group all 'in' 
and 'not in' expressions by 'attribute reference' 

    i) eq(a,val) => in(a,val::Nil)

2) merge all those 'in' and 'not in' operators, like

    i)  or(in(a,list1),in(a,list2)) => in(a, list1 ++ list2)

    ii) or(in(a,list1), not(in(a,list2)) => not(in(a, list2 -- list1)) 

   iii) and(in(a,list1),in(a,list2)) => in(a, list1 intersect list2) 

   vi) and(in(a,list1),not(in(a,list2))) => in(a, list1 – list2) 

3) revert in operator into 'equal' if there is only one element in the list. 

  i) in(a,list) if list.size == 1 => eq(a,list.head) 

 

 

  was:
Basically, the constraints of a union table could be turned empty if any 
subtable is turned into an empty local relation. The side effect is filter 
cannot be 

[jira] [Updated] (SPARK-28103) Cannot infer filters from union table with empty local relation table properly

2019-06-18 Thread William Wong (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

William Wong updated SPARK-28103:
-
Description: 
Basically, the constraints of a union table could be turned empty if any 
subtable is turned into an empty local relation. The side effect is filter 
cannot be inferred correctly (by InferFiltersFromConstrains) 

 
 We may reproduce the issue with the following setup:

1) Prepare two tables: 

 
{code:java}
spark.sql("CREATE TABLE IF NOT EXISTS table1(id string, val string) USING 
PARQUET");
spark.sql("CREATE TABLE IF NOT EXISTS table2(id string, val string) USING 
PARQUET");{code}
 

2) Create a union view on table1. 
{code:java}
spark.sql("""
     | CREATE VIEW partitioned_table_1 AS
     | SELECT * FROM table1 WHERE id = 'a'
     | UNION ALL
     | SELECT * FROM table1 WHERE id = 'b'
     | UNION ALL
     | SELECT * FROM table1 WHERE id = 'c'
     | UNION ALL
     | SELECT * FROM table1 WHERE id NOT IN ('a','b','c')
     | """.stripMargin){code}
 

 3) View the optimized plan of this SQL. The filter '[t2.id = 'a']' cannot be 
inferred. We can see that the constraints of the left table are empty. 
{code:java}
scala> spark.sql("SELECT * FROM partitioned_table_1 t1, table2 t2 WHERE t1.id 
[t1.id] = t2.id [t2.id] AND t1.id [t1.id] = 'a'").queryExecution.optimizedPlan

res39: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan =
Join Inner, (id#0 = id#4)
:- Union
:  :- Filter (isnotnull(id#0) && (id#0 = a))
:  :  +- Relation[id#0,val#1] parquet
:  :- LocalRelation , [id#0, val#1]
:  :- LocalRelation , [id#0, val#1]
:  +- Filter ((isnotnull(id#0) && NOT id#0 IN (a,b,c)) && (id#0 = a))
:     +- Relation[id#0,val#1] parquet
+- Filter isnotnull(id#4)
   +- Relation[id#4,val#5] parquet

scala> spark.sql("SELECT * FROM partitioned_table_1 t1, table2 t2 WHERE t1.id 
[t1.id] = t2.id [t2.id] AND t1.id [t1.id] = 
'a'").queryExecution.optimizedPlan.children(0).constraints
res40: org.apache.spark.sql.catalyst.expressions.ExpressionSet = Set()
 
{code}
 

4) Modified the query to avoid empty local relation. The filter '[td.id in 
('a','b','c','d')' is then inferred properly. The constraints of the left table 
are not empty as well. 
{code:java}
scala> spark.sql("SELECT * FROM partitioned_table_1 t1, table2 t2 WHERE t1.id 
[t1.id] = t2.id [t2.id] AND t1.id [t1.id] IN 
('a','b','c','d')").queryExecution.optimizedPlan
res42: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan =
Join Inner, (id#0 = id#4)
:- Union
:  :- Filter ((isnotnull(id#0) && (id#0 = a)) && id#0 IN (a,b,c,d))
:  :  +- Relation[id#0,val#1] parquet
:  :- Filter ((isnotnull(id#0) && (id#0 = b)) && id#0 IN (a,b,c,d))
:  :  +- Relation[id#0,val#1] parquet
:  :- Filter ((isnotnull(id#0) && (id#0 = c)) && id#0 IN (a,b,c,d))
:  :  +- Relation[id#0,val#1] parquet
:  +- Filter ((NOT id#0 IN (a,b,c) && id#0 IN (a,b,c,d)) && isnotnull(id#0))
:     +- Relation[id#0,val#1] parquet
+- Filter ((id#4 IN (a,b,c,d) && ((isnotnull(id#4) && (((id#4 = a) || (id#4 = 
b)) || (id#4 = c))) || NOT id#4 IN (a,b,c))) && isnotnull(id#4))
   +- Relation[id#4,val#5] parquet
 
scala> spark.sql("SELECT * FROM partitioned_table_1 t1, table2 t2 WHERE t1.id 
[t1.id] = t2.id [t2.id] AND t1.id [t1.id] IN 
('a','b','c','d')").queryExecution.optimizedPlan.children(0).constraints
res44: org.apache.spark.sql.catalyst.expressions.ExpressionSet = 
Set(isnotnull(id#0), id#0 IN (a,b,c,d), id#0 = a) || (id#0 = b)) || (id#0 = 
c)) || NOT id#0 IN (a,b,c)))
{code}
 

One of the possible workaround is create a rule to remove all empty local 
relation from a union table. Or, when we convert a relation to into an empty 
local relation, we should preserve those constraints in the empty local 
relation as well. 

 

A side node. Expression in optimized plan is not well optimized. For example, 
the expression 
{code:java}
((id#4 IN (a,b,c,d) && ((isnotnull(id#4) && (((id#4 = a) || (id#4 = b)) || 
(id#4 = c))) || NOT id#4 IN (a,b,c))) && isnotnull(id#4)){code}
could be further optimized into 
{code:java}
(isnotnull(id#4) && (id = d)){code}
We may implement another rule to 

1) convert all 'equal' operators into 'in' operator, and then group all 
expressions by 'attribute reference' 

3) merge all those 'in' (or not in) operators 

4) revert in operator into 'equal' if there is only one element in the set. 

 

  was:
Basically, the constraints of a union table could be turned empty if any 
subtable is turned into an empty local relation. The side effect is filter 
cannot be inferred correctly (by InferFiltersFromConstrains) 

 
We may reproduce the issue with the following setup:

1) Prepare two tables: 

 
{code:java}
spark.sql("CREATE TABLE IF NOT EXISTS table1(id string, val string) USING 
PARQUET");
spark.sql("CREATE TABLE IF NOT EXISTS table2(id string, val string) USING 
PARQUET");{code}
 

2) Create a union view on table1. 
{code:java}
spark.sql("""
     | CREATE VIEW 

[jira] [Created] (SPARK-28103) Cannot infer filters from union table with empty local relation table properly

2019-06-18 Thread William Wong (JIRA)
William Wong created SPARK-28103:


 Summary: Cannot infer filters from union table with empty local 
relation table properly
 Key: SPARK-28103
 URL: https://issues.apache.org/jira/browse/SPARK-28103
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.1, 2.3.2
Reporter: William Wong


Basically, the constraints of a union table could be turned empty if any 
subtable is turned into an empty local relation. The side effect is filter 
cannot be inferred correctly (by InferFiltersFromConstrains) 

 
We may reproduce the issue with the following setup:

1) Prepare two tables: 

 
{code:java}
spark.sql("CREATE TABLE IF NOT EXISTS table1(id string, val string) USING 
PARQUET");
spark.sql("CREATE TABLE IF NOT EXISTS table2(id string, val string) USING 
PARQUET");{code}
 

2) Create a union view on table1. 
{code:java}
spark.sql("""
     | CREATE VIEW partitioned_table_1 AS
     | SELECT * FROM table1 WHERE id = 'a'
     | UNION ALL
     | SELECT * FROM table1 WHERE id = 'b'
     | UNION ALL
     | SELECT * FROM table1 WHERE id = 'c'
     | UNION ALL
     | SELECT * FROM table1 WHERE id NOT IN ('a','b','c')
     | """.stripMargin){code}
 

 3) View the optimized plan of this SQL. The filter '[t2.id 
[t2.id]|https://urldefense.proofpoint.com/v2/url?u=http-3A__t2.id=DwMFaQ=lxzXOFU02467FL7HOPRqCw=QLWkn-MIQZ6wM0VKRZSxipwIbmB7fKk9_zd1_axi-XQ=ezb9buJE3VsOytBu2oydJfvIfdTmVHPIGwaagdYSG98=L-aQUAtCG1PufnRe0Hy0adnmxqny1GitX8OJV9zq2oI=]
 = 'a'' cannot be inferred. We can see that the constraints of the left table 
are empty.

 
{code:java}
 
scala> spark.sql("SELECT * FROM partitioned_table_1 t1, table2 t2 WHERE t1.id 
[t1.id] = t2.id [t2.id] AND t1.id [t1.id] = 'a'").queryExecution.optimizedPlan

res39: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan =
Join Inner, (id#0 = id#4)
:- Union
:  :- Filter (isnotnull(id#0) && (id#0 = a))
:  :  +- Relation[id#0,val#1] parquet
:  :- LocalRelation , [id#0, val#1]
:  :- LocalRelation , [id#0, val#1]
:  +- Filter ((isnotnull(id#0) && NOT id#0 IN (a,b,c)) && (id#0 = a))
:     +- Relation[id#0,val#1] parquet
+- Filter isnotnull(id#4)
   +- Relation[id#4,val#5] parquet

scala> spark.sql("SELECT * FROM partitioned_table_1 t1, table2 t2 WHERE t1.id 
[t1.id] = t2.id [t2.id] AND t1.id [t1.id] = 
'a'").queryExecution.optimizedPlan.children(0).constraints
res40: org.apache.spark.sql.catalyst.expressions.ExpressionSet = Set()
 
{code}
 

4) Modified the query to avoid empty local relation. The filter '[t2.id 
[t2.id]|https://urldefense.proofpoint.com/v2/url?u=http-3A__t2.id=DwMFaQ=lxzXOFU02467FL7HOPRqCw=QLWkn-MIQZ6wM0VKRZSxipwIbmB7fKk9_zd1_axi-XQ=ezb9buJE3VsOytBu2oydJfvIfdTmVHPIGwaagdYSG98=L-aQUAtCG1PufnRe0Hy0adnmxqny1GitX8OJV9zq2oI=]
 in ('a','b','c','d')' is then inferred properly. The constraints of the left 
table are not empty as well.

 

 
{code:java}
scala> spark.sql("SELECT * FROM partitioned_table_1 t1, table2 t2 WHERE t1.id 
[t1.id] = t2.id [t2.id] AND t1.id [t1.id] IN 
('a','b','c','d')").queryExecution.optimizedPlan
res42: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan =
Join Inner, (id#0 = id#4)
:- Union
:  :- Filter ((isnotnull(id#0) && (id#0 = a)) && id#0 IN (a,b,c,d))
:  :  +- Relation[id#0,val#1] parquet
:  :- Filter ((isnotnull(id#0) && (id#0 = b)) && id#0 IN (a,b,c,d))
:  :  +- Relation[id#0,val#1] parquet
:  :- Filter ((isnotnull(id#0) && (id#0 = c)) && id#0 IN (a,b,c,d))
:  :  +- Relation[id#0,val#1] parquet
:  +- Filter ((NOT id#0 IN (a,b,c) && id#0 IN (a,b,c,d)) && isnotnull(id#0))
:     +- Relation[id#0,val#1] parquet
+- Filter ((id#4 IN (a,b,c,d) && ((isnotnull(id#4) && (((id#4 = a) || (id#4 = 
b)) || (id#4 = c))) || NOT id#4 IN (a,b,c))) && isnotnull(id#4))
   +- Relation[id#4,val#5] parquet
 
scala> spark.sql("SELECT * FROM partitioned_table_1 t1, table2 t2 WHERE t1.id 
[t1.id] = t2.id [t2.id] AND t1.id [t1.id] IN 
('a','b','c','d')").queryExecution.optimizedPlan.children(0).constraints
res44: org.apache.spark.sql.catalyst.expressions.ExpressionSet = 
Set(isnotnull(id#0), id#0 IN (a,b,c,d), id#0 = a) || (id#0 = b)) || (id#0 = 
c)) || NOT id#0 IN (a,b,c)))
{code}
 

 

One of the possible workaround is create a rule to remove all empty local 
relation from a union table. Or, when we convert a relation to into an empty 
local relation, we should preserve those constraints in the empty local 
relation as well. 

 

A side node. Expression in optimized plan is not well optimized. For example, 
the expression 
{code:java}
((id#4 IN (a,b,c,d) && ((isnotnull(id#4) && (((id#4 = a) || (id#4 = b)) || 
(id#4 = c))) || NOT id#4 IN (a,b,c))) && isnotnull(id#4)){code}
could be further optimized into 
{code:java}
(isnotnull(id#4) && (id = d)){code}
We may implement another rule to 

1) convert all 'equal' operators into 'in' operator, and then group all 
expressions by 'attribute reference' 

3) merge all those 'in' (or 

[jira] [Commented] (SPARK-27772) SQLTestUtils Refactoring

2019-05-30 Thread William Wong (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852148#comment-16852148
 ] 

William Wong commented on SPARK-27772:
--

Hi [~hyukjin.kwon], I submitted the PR and add a test case to demonstrate how 
the change behaves. Basically, one of the example is if something create a test 
by providing 'null' table or cache for those WithXXX method, we should hit a 
null pointer exception in related closing block (finally block). the null 
pointer exception would mask the true exception. 

> SQLTestUtils Refactoring
> 
>
> Key: SPARK-27772
> URL: https://issues.apache.org/jira/browse/SPARK-27772
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: William Wong
>Priority: Minor
>
> The current `SQLTestUtils` created many `withXXX` utility functions to clean 
> up tables/views/caches created for testing purpose. Some of those `withXXX` 
> functions ignore certain exceptions, like `NoSuchTableException` in the clean 
> up block (ie, the finally block). 
>  
> {code:java}
> /**
>  * Drops temporary view `viewNames` after calling `f`.
>  */
> protected def withTempView(viewNames: String*)(f: => Unit): Unit = {
>   try f finally {
> // If the test failed part way, we don't want to mask the failure by 
> failing to remove
> // temp views that never got created.
> try viewNames.foreach(spark.catalog.dropTempView) catch {
>   case _: NoSuchTableException =>
> }
>   }
> }
> {code}
> I believe it is not the best approach. Because it is hard to anticipate what 
> exception should or should not be ignored.  
>  
> Java's `try-with-resources` statement does not mask exception throwing in the 
> try block with any exception caught in the 'close()' statement. Exception 
> caught in the 'close()' statement would add as a suppressed exception 
> instead. It sounds a better approach.
>  
> Therefore, I proposed to standardise those 'withXXX' function with following 
> `withFinallyBlock` function, which does something similar to Java's 
> try-with-resources statement. 
> {code:java}
> /**
> * Drops temporary view `viewNames` after calling `f`.
> */
> protected def withTempView(viewNames: String*)(f: => Unit): Unit = {
>   tryWithFinally(f)(viewNames.foreach(spark.catalog.dropTempView))
> }
> /**
>  * Executes the given tryBlock and then the given finallyBlock no matter 
> whether tryBlock throws
>  * an exception. If both tryBlock and finallyBlock throw exceptions, the 
> exception thrown
>  * from the finallyBlock with be added to the exception thrown from tryBlock 
> as a
>  * suppress exception. It helps to avoid masking the exception from tryBlock 
> with exception
>  * from finallyBlock
>  */
> private def tryWithFinally(tryBlock: => Unit)(finallyBlock: => Unit): Unit = {
>   var fromTryBlock: Throwable = null
>   try tryBlock catch {
> case cause: Throwable =>
>   fromTryBlock = cause
>   throw cause
>   } finally {
> if (fromTryBlock != null) {
>   try finallyBlock catch {
> case fromFinallyBlock: Throwable =>
>   fromTryBlock.addSuppressed(fromFinallyBlock)
>   throw fromTryBlock
>   }
> } else {
>   finallyBlock
> }
>   }
> }
> {code}
> If a feature is well written, we show not hit any exception in those closing 
> method in testcase. The purpose of this proposal is to help developers to 
> identify what actually break their tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27772) SQLTestUtils Refactoring

2019-05-30 Thread William Wong (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

William Wong updated SPARK-27772:
-
Description: 
The current `SQLTestUtils` created many `withXXX` utility functions to clean up 
tables/views/caches created for testing purpose. Some of those `withXXX` 
functions ignore certain exceptions, like `NoSuchTableException` in the clean 
up block (ie, the finally block). 

 
{code:java}
/**
 * Drops temporary view `viewNames` after calling `f`.
 */
protected def withTempView(viewNames: String*)(f: => Unit): Unit = {
  try f finally {
// If the test failed part way, we don't want to mask the failure by 
failing to remove
// temp views that never got created.
try viewNames.foreach(spark.catalog.dropTempView) catch {
  case _: NoSuchTableException =>
}
  }
}
{code}
I believe it is not the best approach. Because it is hard to anticipate what 
exception should or should not be ignored.  

 

Java's `try-with-resources` statement does not mask exception throwing in the 
try block with any exception caught in the 'close()' statement. Exception 
caught in the 'close()' statement would add as a suppressed exception instead. 
It sounds a better approach.

 

Therefore, I proposed to standardise those 'withXXX' function with following 
`withFinallyBlock` function, which does something similar to Java's 
try-with-resources statement. 
{code:java}
/**
* Drops temporary view `viewNames` after calling `f`.
*/
protected def withTempView(viewNames: String*)(f: => Unit): Unit = {
  tryWithFinally(f)(viewNames.foreach(spark.catalog.dropTempView))
}


/**
 * Executes the given tryBlock and then the given finallyBlock no matter 
whether tryBlock throws
 * an exception. If both tryBlock and finallyBlock throw exceptions, the 
exception thrown
 * from the finallyBlock with be added to the exception thrown from tryBlock as 
a
 * suppress exception. It helps to avoid masking the exception from tryBlock 
with exception
 * from finallyBlock
 */
private def tryWithFinally(tryBlock: => Unit)(finallyBlock: => Unit): Unit = {
  var fromTryBlock: Throwable = null
  try tryBlock catch {
case cause: Throwable =>
  fromTryBlock = cause
  throw cause
  } finally {
if (fromTryBlock != null) {
  try finallyBlock catch {
case fromFinallyBlock: Throwable =>
  fromTryBlock.addSuppressed(fromFinallyBlock)
  throw fromTryBlock
  }
} else {
  finallyBlock
}
  }
}
{code}
If a feature is well written, we show not hit any exception in those closing 
method in testcase. The purpose of this proposal is to help developers to 
identify what actually break their tests.

  was:
The current `SQLTestUtils` created many `withXXX` utility functions to clean up 
tables/views/caches created for testing purpose. Some of those `withXXX` 
functions ignore certain exceptions, like `NoSuchTableException` in the clean 
up block (ie, the finally block). 

 
{code:java}
/**
 * Drops temporary view `viewNames` after calling `f`.
 */
protected def withTempView(viewNames: String*)(f: => Unit): Unit = {
  try f finally {
// If the test failed part way, we don't want to mask the failure by 
failing to remove
// temp views that never got created.
try viewNames.foreach(spark.catalog.dropTempView) catch {
  case _: NoSuchTableException =>
}
  }
}
{code}
Maybe it is not the best approach. Because it is hard to anticipate what 
exception should or should not be ignored.  Java's `try-with-resources` 
statement does not mask exception throwing in the try block with any exception 
caught in the 'close()' statement. Exception caught in the 'close()' statement 
would add as a suppressed exception instead. IMHO, it is a better approach.  

 

Therefore, I proposed to standardise those 'withXXX' function with following 
`withFinallyBlock` function, which does something similar to Java's 
try-with-resources statement. 
{code:java}
/**
* Drops temporary view `viewNames` after calling `f`.
*/
protected def withTempView(viewNames: String*)(f: => Unit): Unit = {
  withFinallyBlock(f)(viewNames.foreach(spark.catalog.dropTempView))
}


/**
 * Executes the given tryBlock and then the given finallyBlock no matter 
whether tryBlock throws
 * an exception. If both tryBlock and finallyBlock throw exceptions, the 
exception thrown
 * from the finallyBlock with be added to the exception thrown from tryBlock as 
a
 * suppress exception. It helps to avoid masking the exception from tryBlock 
with exception
 * from finallyBlock
 */
private def withFinallyBlock(tryBlock: => Unit)(finallyBlock: => Unit): Unit = {
  var fromTryBlock : Throwable = null
  try tryBlock catch {
case cause : Throwable =>
  fromTryBlock = cause
  throw cause
  } finally {
if (fromTryBlock != null) {
  try finallyBlock catch {
case fromFinallyBlock : Throwable =>
  

[jira] [Updated] (SPARK-27772) SQLTestUtils Refactoring

2019-05-30 Thread William Wong (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

William Wong updated SPARK-27772:
-
Description: 
The current `SQLTestUtils` created many `withXXX` utility functions to clean up 
tables/views/caches created for testing purpose. Some of those `withXXX` 
functions ignore certain exceptions, like `NoSuchTableException` in the clean 
up block (ie, the finally block). 

 
{code:java}
/**
 * Drops temporary view `viewNames` after calling `f`.
 */
protected def withTempView(viewNames: String*)(f: => Unit): Unit = {
  try f finally {
// If the test failed part way, we don't want to mask the failure by 
failing to remove
// temp views that never got created.
try viewNames.foreach(spark.catalog.dropTempView) catch {
  case _: NoSuchTableException =>
}
  }
}
{code}
Maybe it is not the best approach. Because it is hard to anticipate what 
exception should or should not be ignored.  Java's `try-with-resources` 
statement does not mask exception throwing in the try block with any exception 
caught in the 'close()' statement. Exception caught in the 'close()' statement 
would add as a suppressed exception instead. IMHO, it is a better approach.  

 

Therefore, I proposed to standardise those 'withXXX' function with following 
`withFinallyBlock` function, which does something similar to Java's 
try-with-resources statement. 
{code:java}
/**
* Drops temporary view `viewNames` after calling `f`.
*/
protected def withTempView(viewNames: String*)(f: => Unit): Unit = {
  withFinallyBlock(f)(viewNames.foreach(spark.catalog.dropTempView))
}


/**
 * Executes the given tryBlock and then the given finallyBlock no matter 
whether tryBlock throws
 * an exception. If both tryBlock and finallyBlock throw exceptions, the 
exception thrown
 * from the finallyBlock with be added to the exception thrown from tryBlock as 
a
 * suppress exception. It helps to avoid masking the exception from tryBlock 
with exception
 * from finallyBlock
 */
private def withFinallyBlock(tryBlock: => Unit)(finallyBlock: => Unit): Unit = {
  var fromTryBlock : Throwable = null
  try tryBlock catch {
case cause : Throwable =>
  fromTryBlock = cause
  throw cause
  } finally {
if (fromTryBlock != null) {
  try finallyBlock catch {
case fromFinallyBlock : Throwable =>
  fromTryBlock.addSuppressed(fromFinallyBlock)
  throw fromTryBlock
  }
} else {
  finallyBlock
}
  }
}
{code}
 

If a feature is well written, we show not hit any exception in those closing 
method in testcase. The purpose of this proposal is to help developer to 
identify what may break in the test case. I believe masking the original 
exception with any other exception is not the best approach.

  was:
The current `SQLTestUtils` created many `withXXX` utility functions to clean up 
tables/views/caches created for testing purpose. Some of those `withXXX` 
functions ignore certain exceptions, like `NoSuchTableException` in the clean 
up block (ie, the finally block). 

 
{code:java}
/**
 * Drops temporary view `viewNames` after calling `f`.
 */
protected def withTempView(viewNames: String*)(f: => Unit): Unit = {
  try f finally {
// If the test failed part way, we don't want to mask the failure by 
failing to remove
// temp views that never got created.
try viewNames.foreach(spark.catalog.dropTempView) catch {
  case _: NoSuchTableException =>
}
  }
}
{code}
Maybe it is not the best approach. Because it is hard to anticipate what 
exception should or should not be ignored.  Java's `try-with-resources` 
statement does not mask exception throwing in the try block with any exception 
caught in the 'close()' statement. Exception caught in the 'close()' statement 
would add as a suppressed exception instead. IMHO, it is a better approach.  

 

Therefore, I proposed to standardise those 'withXXX' function with following 
`withFinallyBlock` function, which does something similar to Java's 
try-with-resources statement. 
{code:java}
/**
* Drops temporary view `viewNames` after calling `f`.
*/
protected def withTempView(viewNames: String*)(f: => Unit): Unit = {
  withFinallyBlock(f)(viewNames.foreach(spark.catalog.dropTempView))
}


/**
 * Executes the given tryBlock and then the given finallyBlock no matter 
whether tryBlock throws
 * an exception. If both tryBlock and finallyBlock throw exceptions, the 
exception thrown
 * from the finallyBlock with be added to the exception thrown from tryBlock as 
a
 * suppress exception. It helps to avoid masking the exception from tryBlock 
with exception
 * from finallyBlock
 */
private def withFinallyBlock(tryBlock: => Unit)(finallyBlock: => Unit): Unit = {
  var fromTryBlock : Throwable = null
  try tryBlock catch {
case cause : Throwable =>
  fromTryBlock = cause
  throw cause
  } finally {
if (fromTryBlock != null) {
  try 

[jira] [Updated] (SPARK-27772) SQLTestUtils Refactoring

2019-05-29 Thread William Wong (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

William Wong updated SPARK-27772:
-
Description: 
The current `SQLTestUtils` created many `withXXX` utility functions to clean up 
tables/views/caches created for testing purpose. Some of those `withXXX` 
functions ignore certain exceptions, like `NoSuchTableException` in the clean 
up block (ie, the finally block). 

 
{code:java}
/**
 * Drops temporary view `viewNames` after calling `f`.
 */
protected def withTempView(viewNames: String*)(f: => Unit): Unit = {
  try f finally {
// If the test failed part way, we don't want to mask the failure by 
failing to remove
// temp views that never got created.
try viewNames.foreach(spark.catalog.dropTempView) catch {
  case _: NoSuchTableException =>
}
  }
}
{code}
Maybe it is not the best approach. Because it is hard to anticipate what 
exception should or should not be ignored.  Java's `try-with-resources` 
statement does not mask exception throwing in the try block with any exception 
caught in the 'close()' statement. Exception caught in the 'close()' statement 
would add as a suppressed exception instead. IMHO, it is a better approach.  

 

Therefore, I proposed to standardise those 'withXXX' function with following 
`withFinallyBlock` function, which does something similar to Java's 
try-with-resources statement. 
{code:java}
/**
* Drops temporary view `viewNames` after calling `f`.
*/
protected def withTempView(viewNames: String*)(f: => Unit): Unit = {
  withFinallyBlock(f)(viewNames.foreach(spark.catalog.dropTempView))
}


/**
 * Executes the given tryBlock and then the given finallyBlock no matter 
whether tryBlock throws
 * an exception. If both tryBlock and finallyBlock throw exceptions, the 
exception thrown
 * from the finallyBlock with be added to the exception thrown from tryBlock as 
a
 * suppress exception. It helps to avoid masking the exception from tryBlock 
with exception
 * from finallyBlock
 */
private def withFinallyBlock(tryBlock: => Unit)(finallyBlock: => Unit): Unit = {
  var fromTryBlock : Throwable = null
  try tryBlock catch {
case cause : Throwable =>
  fromTryBlock = cause
  throw cause
  } finally {
if (fromTryBlock != null) {
  try finallyBlock catch {
case fromFinallyBlock : Throwable =>
  fromTryBlock.addSuppressed(fromFinallyBlock)
  throw fromTryBlock
  }
} else {
  finallyBlock
}
  }
}
{code}
 

If a feature is well written, we show not hit any exception in those closing 
method in testcase. The purpose of this proposal is to help developer to 
identify what may break in the test case. Swallowing original exception with 
any exception ( not just the missing table exception) throwing from those 
closing method, i believe it is not the best approach.

  was:
The current `SQLTestUtils` created many `withXXX` utility functions to clean up 
tables/views/caches created for testing purpose. Some of those `withXXX` 
functions ignore certain exceptions, like `NoSuchTableException` in the clean 
up block (ie, the finally block). 

 
{code:java}
/**
 * Drops temporary view `viewNames` after calling `f`.
 */
protected def withTempView(viewNames: String*)(f: => Unit): Unit = {
  try f finally {
// If the test failed part way, we don't want to mask the failure by 
failing to remove
// temp views that never got created.
try viewNames.foreach(spark.catalog.dropTempView) catch {
  case _: NoSuchTableException =>
}
  }
}
{code}
Maybe it is not the best approach. Because it is hard to anticipate what 
exception should or should not be ignored.  Java's `try-with-resources` 
statement does not mask exception throwing in the try block with any exception 
caught in the 'close()' statement. Exception caught in the 'close()' statement 
would add as a suppressed exception instead. IMHO, it is a better approach.  

 

Therefore, I proposed to standardise those 'withXXX' function with following 
`withFinallyBlock` function, which does something similar to Java's 
try-with-resources statement. 
{code:java}
/**
* Drops temporary view `viewNames` after calling `f`.
*/
protected def withTempView(viewNames: String*)(f: => Unit): Unit = {
  withFinallyBlock(f)(viewNames.foreach(spark.catalog.dropTempView))
}


/**
 * Executes the given tryBlock and then the given finallyBlock no matter 
whether tryBlock throws
 * an exception. If both tryBlock and finallyBlock throw exceptions, the 
exception thrown
 * from the finallyBlock with be added to the exception thrown from tryBlock as 
a
 * suppress exception. It helps to avoid masking the exception from tryBlock 
with exception
 * from finallyBlock
 */
private def withFinallyBlock(tryBlock: => Unit)(finallyBlock: => Unit): Unit = {
  var fromTryBlock : Throwable = null
  try tryBlock catch {
case cause : Throwable =>
  fromTryBlock = cause
 

[jira] [Updated] (SPARK-27772) SQLTestUtils Refactoring

2019-05-29 Thread William Wong (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

William Wong updated SPARK-27772:
-
Description: 
The current `SQLTestUtils` created many `withXXX` utility functions to clean up 
tables/views/caches created for testing purpose. Some of those `withXXX` 
functions ignore certain exceptions, like `NoSuchTableException` in the clean 
up block (ie, the finally block). 

 
{code:java}
/**
 * Drops temporary view `viewNames` after calling `f`.
 */
protected def withTempView(viewNames: String*)(f: => Unit): Unit = {
  try f finally {
// If the test failed part way, we don't want to mask the failure by 
failing to remove
// temp views that never got created.
try viewNames.foreach(spark.catalog.dropTempView) catch {
  case _: NoSuchTableException =>
}
  }
}
{code}
Maybe it is not the best approach. Because it is hard to anticipate what 
exception should or should not be ignored.  Java's `try-with-resources` 
statement does not mask exception throwing in the try block with any exception 
caught in the 'close()' statement. Exception caught in the 'close()' statement 
would add as a suppressed exception instead. IMHO, it is a better approach.  

 

Therefore, I proposed to standardise those 'withXXX' function with following 
`withFinallyBlock` function, which does something similar to Java's 
try-with-resources statement. 
{code:java}
/**
* Drops temporary view `viewNames` after calling `f`.
*/
protected def withTempView(viewNames: String*)(f: => Unit): Unit = {
  withFinallyBlock(f)(viewNames.foreach(spark.catalog.dropTempView))
}


/**
 * Executes the given tryBlock and then the given finallyBlock no matter 
whether tryBlock throws
 * an exception. If both tryBlock and finallyBlock throw exceptions, the 
exception thrown
 * from the finallyBlock with be added to the exception thrown from tryBlock as 
a
 * suppress exception. It helps to avoid masking the exception from tryBlock 
with exception
 * from finallyBlock
 */
private def withFinallyBlock(tryBlock: => Unit)(finallyBlock: => Unit): Unit = {
  var fromTryBlock : Throwable = null
  try tryBlock catch {
case cause : Throwable =>
  fromTryBlock = cause
  throw cause
  } finally {
if (fromTryBlock != null) {
  try finallyBlock catch {
case fromFinallyBlock : Throwable =>
  fromTryBlock.addSuppressed(fromFinallyBlock)
  throw fromTryBlock
  }
} else {
  finallyBlock
}
  }
}
{code}
 

  was:
The current `SQLTestUtils` created many `withXXX` utility functions to clean up 
tables/views/caches created for testing purpose. Some of those `withXXX` 
functions ignore certain exceptions, like `NoSuchTableException` in the clean 
up block (ie, the finally block). 

 
{code:java}
/**
 * Drops temporary view `viewNames` after calling `f`.
 */
protected def withTempView(viewNames: String*)(f: => Unit): Unit = {
  try f finally {
// If the test failed part way, we don't want to mask the failure by 
failing to remove
// temp views that never got created.
try viewNames.foreach(spark.catalog.dropTempView) catch {
  case _: NoSuchTableException =>
}
  }
}
{code}
Maybe it is not the best approach. Because it is hard to anticipate what 
exception should or should not be ignored.  

 

We may hit similar scenario with Java's `try-with-resources` statement. Java 
does not mask exception throws in the try block with any exception caught in 
the 'close()' statement. Exception caught in the 'close()' statement would add 
as a suppressed exception instead. IMHO, it is a better approach.  

 

Therefore, I proposed to standardise those 'withXXX' function with following 
`withFinallyBlock` function, which does something similar to Java's 
try-with-resources statement. 
{code:java}
/**
* Drops temporary view `viewNames` after calling `f`.
*/
protected def withTempView(viewNames: String*)(f: => Unit): Unit = {
  withFinallyBlock(f)(viewNames.foreach(spark.catalog.dropTempView))
}


/**
 * Executes the given tryBlock and then the given finallyBlock no matter 
whether tryBlock throws
 * an exception. If both tryBlock and finallyBlock throw exceptions, the 
exception thrown
 * from the finallyBlock with be added to the exception thrown from tryBlock as 
a
 * suppress exception. It helps to avoid masking the exception from tryBlock 
with exception
 * from finallyBlock
 */
private def withFinallyBlock(tryBlock: => Unit)(finallyBlock: => Unit): Unit = {
  var fromTryBlock : Throwable = null
  try tryBlock catch {
case cause : Throwable =>
  fromTryBlock = cause
  throw cause
  } finally {
if (fromTryBlock != null) {
  try finallyBlock catch {
case fromFinallyBlock : Throwable =>
  fromTryBlock.addSuppressed(fromFinallyBlock)
  throw fromTryBlock
  }
} else {
  finallyBlock
}
  }
}
{code}
 


> SQLTestUtils Refactoring
> 

[jira] [Commented] (SPARK-27772) SQLTestUtils Refactoring

2019-05-28 Thread William Wong (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16850258#comment-16850258
 ] 

William Wong commented on SPARK-27772:
--

If a feature is well written, we don’t expect to hit any exception in those 
closing method. Maybe that is why you did not hit any issue with that. 

The purpose of this proposal is to help developer to identify what may break in 
the test case. Swallowing original exception with any exception ( not just the 
missing table exception) throwing from those closing method, i believe it is 
not the best approach. 

I will create a PR for this JIRA soon for reviewing. 

> SQLTestUtils Refactoring
> 
>
> Key: SPARK-27772
> URL: https://issues.apache.org/jira/browse/SPARK-27772
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: William Wong
>Priority: Minor
>
> The current `SQLTestUtils` created many `withXXX` utility functions to clean 
> up tables/views/caches created for testing purpose. Some of those `withXXX` 
> functions ignore certain exceptions, like `NoSuchTableException` in the clean 
> up block (ie, the finally block). 
>  
> {code:java}
> /**
>  * Drops temporary view `viewNames` after calling `f`.
>  */
> protected def withTempView(viewNames: String*)(f: => Unit): Unit = {
>   try f finally {
> // If the test failed part way, we don't want to mask the failure by 
> failing to remove
> // temp views that never got created.
> try viewNames.foreach(spark.catalog.dropTempView) catch {
>   case _: NoSuchTableException =>
> }
>   }
> }
> {code}
> Maybe it is not the best approach. Because it is hard to anticipate what 
> exception should or should not be ignored.  
>  
> We may hit similar scenario with Java's `try-with-resources` statement. Java 
> does not mask exception throws in the try block with any exception caught in 
> the 'close()' statement. Exception caught in the 'close()' statement would 
> add as a suppressed exception instead. IMHO, it is a better approach.  
>  
> Therefore, I proposed to standardise those 'withXXX' function with following 
> `withFinallyBlock` function, which does something similar to Java's 
> try-with-resources statement. 
> {code:java}
> /**
> * Drops temporary view `viewNames` after calling `f`.
> */
> protected def withTempView(viewNames: String*)(f: => Unit): Unit = {
>   withFinallyBlock(f)(viewNames.foreach(spark.catalog.dropTempView))
> }
> /**
>  * Executes the given tryBlock and then the given finallyBlock no matter 
> whether tryBlock throws
>  * an exception. If both tryBlock and finallyBlock throw exceptions, the 
> exception thrown
>  * from the finallyBlock with be added to the exception thrown from tryBlock 
> as a
>  * suppress exception. It helps to avoid masking the exception from tryBlock 
> with exception
>  * from finallyBlock
>  */
> private def withFinallyBlock(tryBlock: => Unit)(finallyBlock: => Unit): Unit 
> = {
>   var fromTryBlock : Throwable = null
>   try tryBlock catch {
> case cause : Throwable =>
>   fromTryBlock = cause
>   throw cause
>   } finally {
> if (fromTryBlock != null) {
>   try finallyBlock catch {
> case fromFinallyBlock : Throwable =>
>   fromTryBlock.addSuppressed(fromFinallyBlock)
>   throw fromTryBlock
>   }
> } else {
>   finallyBlock
> }
>   }
> }
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27772) SQLTestUtils Refactoring

2019-05-19 Thread William Wong (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

William Wong updated SPARK-27772:
-
Description: 
The current `SQLTestUtils` created many `withXXX` utility functions to clean up 
tables/views/caches created for testing purpose. Some of those `withXXX` 
functions ignore certain exceptions, like `NoSuchTableException` in the clean 
up block (ie, the finally block). 

 
{code:java}
/**
 * Drops temporary view `viewNames` after calling `f`.
 */
protected def withTempView(viewNames: String*)(f: => Unit): Unit = {
  try f finally {
// If the test failed part way, we don't want to mask the failure by 
failing to remove
// temp views that never got created.
try viewNames.foreach(spark.catalog.dropTempView) catch {
  case _: NoSuchTableException =>
}
  }
}
{code}
Maybe it is not the best approach. Because it is hard to anticipate what 
exception should or should not be ignored.  

 

We may hit similar scenario with Java's `try-with-resources` statement. Java 
does not mask exception throws in the try block with any exception caught in 
the 'close()' statement. Exception caught in the 'close()' statement would add 
as a suppressed exception instead. IMHO, it is a better approach.  

 

Therefore, I proposed to standardise those 'withXXX' function with following 
`withFinallyBlock` function, which does something similar to Java's 
try-with-resources statement. 
{code:java}
/**
* Drops temporary view `viewNames` after calling `f`.
*/
protected def withTempView(viewNames: String*)(f: => Unit): Unit = {
  withFinallyBlock(f)(viewNames.foreach(spark.catalog.dropTempView))
}


/**
 * Executes the given tryBlock and then the given finallyBlock no matter 
whether tryBlock throws
 * an exception. If both tryBlock and finallyBlock throw exceptions, the 
exception thrown
 * from the finallyBlock with be added to the exception thrown from tryBlock as 
a
 * suppress exception. It helps to avoid masking the exception from tryBlock 
with exception
 * from finallyBlock
 */
private def withFinallyBlock(tryBlock: => Unit)(finallyBlock: => Unit): Unit = {
  var fromTryBlock : Throwable = null
  try tryBlock catch {
case cause : Throwable =>
  fromTryBlock = cause
  throw cause
  } finally {
if (fromTryBlock != null) {
  try finallyBlock catch {
case fromFinallyBlock : Throwable =>
  fromTryBlock.addSuppressed(fromFinallyBlock)
  throw fromTryBlock
  }
} else {
  finallyBlock
}
  }
}
{code}
 

  was:
The current `SQLTestUtils` created many `withXXX` utility functions to clean up 
tables/views/caches created for testing purpose. 

Some of those `withXXX` functions would ignore certain exceptions, like 
`NoSuchTableException` in the clean up block (finally block). 

 
{code:java}
/**
 * Drops temporary view `viewNames` after calling `f`.
 */
protected def withTempView(viewNames: String*)(f: => Unit): Unit = {
  try f finally {
// If the test failed part way, we don't want to mask the failure by 
failing to remove
// temp views that never got created.
try viewNames.foreach(spark.catalog.dropTempView) catch {
  case _: NoSuchTableException =>
}
  }
}
{code}
Maybe it is not the best option. If a test hit an exception in the 'f' closure, 
no matter what exception we hit in the finally block, we should not mask that 
exception with any other exception hit in the finally block. The exception 
caught in the finally block should be reattached to the original exception as a 
suppressed exception. The idea is similar to how java handle 
'try-with-resources' statement. 

 

A proposed to standardise those 'withXXX' function with following 
`withFinallyBlock` function.
{code:java}
/**
* Drops temporary view `viewNames` after calling `f`.
*/
protected def withTempView(viewNames: String*)(f: => Unit): Unit = {
  withFinallyBlock(f)(viewNames.foreach(spark.catalog.dropTempView))
}


/**
 * Executes the given tryBlock and then the given finallyBlock no matter 
whether tryBlock throws
 * an exception. If both tryBlock and finallyBlock throw exceptions, the 
exception thrown
 * from the finallyBlock with be added to the exception thrown from tryBlock as 
a
 * suppress exception. It helps to avoid masking the exception from tryBlock 
with exception
 * from finallyBlock
 */
private def withFinallyBlock(tryBlock: => Unit)(finallyBlock: => Unit): Unit = {
  var fromTryBlock : Throwable = null
  try tryBlock catch {
case cause : Throwable =>
  fromTryBlock = cause
  throw cause
  } finally {
if (fromTryBlock != null) {
  try finallyBlock catch {
case fromFinallyBlock : Throwable =>
  fromTryBlock.addSuppressed(fromFinallyBlock)
  throw fromTryBlock
  }
} else {
  finallyBlock
}
  }
}
{code}
 


> SQLTestUtils Refactoring
> 
>
> Key: SPARK-27772
> 

[jira] [Updated] (SPARK-27772) SQLTestUtils Refactoring

2019-05-19 Thread William Wong (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

William Wong updated SPARK-27772:
-
Description: 
The current `SQLTestUtils` created many `withXXX` utility functions to clean up 
tables/views/caches created for testing purpose. 

Some of those `withXXX` functions would ignore certain exceptions, like 
`NoSuchTableException` in the clean up block (finally block). 

 
{code:java}
/**
 * Drops temporary view `viewNames` after calling `f`.
 */
protected def withTempView(viewNames: String*)(f: => Unit): Unit = {
  try f finally {
// If the test failed part way, we don't want to mask the failure by 
failing to remove
// temp views that never got created.
try viewNames.foreach(spark.catalog.dropTempView) catch {
  case _: NoSuchTableException =>
}
  }
}
{code}
Maybe it is not the best option. If a test hit an exception in the 'f' closure, 
no matter what exception we hit in the finally block, we should not mask that 
exception with any other exception hit in the finally block. The exception 
caught in the finally block should be reattached to the original exception as a 
suppressed exception. The idea is similar to how java handle 
'try-with-resources' statement. 

 

A proposed to standardise those 'withXXX' function with following 
`withFinallyBlock` function.
{code:java}
/**
* Drops temporary view `viewNames` after calling `f`.
*/
protected def withTempView(viewNames: String*)(f: => Unit): Unit = {
  withFinallyBlock(f)(viewNames.foreach(spark.catalog.dropTempView))
}


/**
 * Executes the given tryBlock and then the given finallyBlock no matter 
whether tryBlock throws
 * an exception. If both tryBlock and finallyBlock throw exceptions, the 
exception thrown
 * from the finallyBlock with be added to the exception thrown from tryBlock as 
a
 * suppress exception. It helps to avoid masking the exception from tryBlock 
with exception
 * from finallyBlock
 */
private def withFinallyBlock(tryBlock: => Unit)(finallyBlock: => Unit): Unit = {
  var fromTryBlock : Throwable = null
  try tryBlock catch {
case cause : Throwable =>
  fromTryBlock = cause
  throw cause
  } finally {
if (fromTryBlock != null) {
  try finallyBlock catch {
case fromFinallyBlock : Throwable =>
  fromTryBlock.addSuppressed(fromFinallyBlock)
  throw fromTryBlock
  }
} else {
  finallyBlock
}
  }
}
{code}
 

  was:
The current `SQLTestUtils` created many `withXXX` utility functions to clean up 
tables/views/caches created for testing purpose. 

Some of those `withXXX` functions would ignore certain exceptions, like 
`NoSuchTableException` in the clean up block (finally block). 

 
{code:java}
/**
 * Drops table `tableName` after calling `f`.
 */
protected def withTable(tableNames: String*)(f: => Unit): Unit = {
  try f finally {
tableNames.foreach { name =>
  spark.sql(s"DROP TABLE IF EXISTS $name")
}
  }
}
{code}
Maybe it is not the best option. If a test hit an exception in the 'f' closure, 
no matter what exception we hit in the finally block, we should not mask that 
exception with any other exception hit in the finally block. The exception 
caught in the finally block should be reattached to the original exception as a 
suppressed exception. The idea is similar to how java handle 
'try-with-resources' statement. 

 

A proposed to standardise those 'withXXX' function with following 
`withFinallyBlock` function.
{code:java}
 /**
   * Drops table `tableName` after calling `f`.
   */
  protected def withTable(tableNames: String*)(f: => Unit): Unit = {
withFinallyBlock(f)(
  tableNames.foreach { name =>
spark.sql(s"DROP TABLE IF EXISTS $name")
  }
)
  }

/**
 * Executes the given tryBlock and then the given finallyBlock no matter 
whether tryBlock throws
 * an exception. If both tryBlock and finallyBlock throw exceptions, the 
exception thrown
 * from the finallyBlock with be added to the exception thrown from tryBlock as 
a
 * suppress exception. It helps to avoid masking the exception from tryBlock 
with exception
 * from finallyBlock
 */
private def withFinallyBlock(tryBlock: => Unit)(finallyBlock: => Unit): Unit = {
  var fromTryBlock : Throwable = null
  try tryBlock catch {
case cause : Throwable =>
  fromTryBlock = cause
  throw cause
  } finally {
if (fromTryBlock != null) {
  try finallyBlock catch {
case fromFinallyBlock : Throwable =>
  fromTryBlock.addSuppressed(fromFinallyBlock)
  throw fromTryBlock
  }
} else {
  finallyBlock
}
  }
}
{code}
 


> SQLTestUtils Refactoring
> 
>
> Key: SPARK-27772
> URL: https://issues.apache.org/jira/browse/SPARK-27772
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>

[jira] [Updated] (SPARK-27772) SQLTestUtils Refactoring

2019-05-19 Thread William Wong (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

William Wong updated SPARK-27772:
-
Description: 
The current `SQLTestUtils` created many `withXXX` utility functions to clean up 
tables/views/caches created for testing purpose. 

Some of those `withXXX` functions would ignore certain exceptions, like 
`NoSuchTableException` in the clean up block (finally block). 

 
{code:java}
/**
 * Drops table `tableName` after calling `f`.
 */
protected def withTable(tableNames: String*)(f: => Unit): Unit = {
  try f finally {
tableNames.foreach { name =>
  spark.sql(s"DROP TABLE IF EXISTS $name")
}
  }
}
{code}
Maybe it is not the best option. If a test hit an exception in the 'f' closure, 
no matter what exception we hit in the finally block, we should not mask that 
exception with any other exception hit in the finally block. The exception 
caught in the finally block should be reattached to the original exception as a 
suppressed exception. The idea is similar to how java handle 
'try-with-resources' statement. 

 

A proposed to standardise those 'withXXX' function with following 
`withFinallyBlock` function.
{code:java}
 /**
   * Drops table `tableName` after calling `f`.
   */
  protected def withTable(tableNames: String*)(f: => Unit): Unit = {
withFinallyBlock(f)(
  tableNames.foreach { name =>
spark.sql(s"DROP TABLE IF EXISTS $name")
  }
)
  }

/**
 * Executes the given tryBlock and then the given finallyBlock no matter 
whether tryBlock throws
 * an exception. If both tryBlock and finallyBlock throw exceptions, the 
exception thrown
 * from the finallyBlock with be added to the exception thrown from tryBlock as 
a
 * suppress exception. It helps to avoid masking the exception from tryBlock 
with exception
 * from finallyBlock
 */
private def withFinallyBlock(tryBlock: => Unit)(finallyBlock: => Unit): Unit = {
  var fromTryBlock : Throwable = null
  try tryBlock catch {
case cause : Throwable =>
  fromTryBlock = cause
  throw cause
  } finally {
if (fromTryBlock != null) {
  try finallyBlock catch {
case fromFinallyBlock : Throwable =>
  fromTryBlock.addSuppressed(fromFinallyBlock)
  throw fromTryBlock
  }
} else {
  finallyBlock
}
  }
}
{code}
 

  was:
The current `SQLTestUtils` created many `withXXX` utility functions to clean up 
tables/views/caches created for testing purpose. 

Some of those `withXXX` functions would ignore certain exceptions, like 
`NoSuchTableException` in the clean up block (finally block). 

 
{code:java}
/**
 * Drops table `tableName` after calling `f`.
 */
protected def withTable(tableNames: String*)(f: => Unit): Unit = {
  try f finally {
tableNames.foreach { name =>
  spark.sql(s"DROP TABLE IF EXISTS $name")
}
  }
}
{code}
Maybe it is not the best option. If a test hit an exception in the 'f' closure, 
no matter what exception we hit in the finally block, we should not mask that 
exception with any other exception hit in the finally block. The exception 
caught in the finally block should be reattached to the original exception as a 
suppressed exception. The idea is similar to how java handle 
'try-with-resources' statement. 

 

A proposed to create following function to standardise those 'withXXX' 
functions. 
{code:java}
/**
 * Executes the given tryBlock and then the given finallyBlock no matter 
whether tryBlock throws
 * an exception. If both tryBlock and finallyBlock throw exceptions, the 
exception thrown
 * from the finallyBlock with be added to the exception thrown from tryBlock as 
a
 * suppress exception. It helps to avoid masking the exception from tryBlock 
with exception
 * from finallyBlock
 */
private def withFinallyBlock(tryBlock: => Unit)(finallyBlock: => Unit): Unit = {
  var fromTryBlock : Throwable = null
  try tryBlock catch {
case cause : Throwable =>
  fromTryBlock = cause
  throw cause
  } finally {
if (fromTryBlock != null) {
  try finallyBlock catch {
case fromFinallyBlock : Throwable =>
  fromTryBlock.addSuppressed(fromFinallyBlock)
  throw fromTryBlock
  }
} else {
  finallyBlock
}
  }
}
{code}
 


> SQLTestUtils Refactoring
> 
>
> Key: SPARK-27772
> URL: https://issues.apache.org/jira/browse/SPARK-27772
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: William Wong
>Priority: Minor
>
> The current `SQLTestUtils` created many `withXXX` utility functions to clean 
> up tables/views/caches created for testing purpose. 
> Some of those `withXXX` functions would ignore certain exceptions, like 
> `NoSuchTableException` in the clean up block (finally block). 
>  
> {code:java}
> /**
>  * Drops table `tableName` 

[jira] [Updated] (SPARK-27772) SQLTestUtils Refactoring

2019-05-19 Thread William Wong (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

William Wong updated SPARK-27772:
-
Description: 
The current `SQLTestUtils` created many `withXXX` utility functions to clean up 
tables/views/caches created for testing purpose. 

Some of those `withXXX` functions would ignore certain exceptions, like 
`NoSuchTableException` in the clean up block (finally block). 

 
{code:java}
/**
 * Drops table `tableName` after calling `f`.
 */
protected def withTable(tableNames: String*)(f: => Unit): Unit = {
  try f finally {
tableNames.foreach { name =>
  spark.sql(s"DROP TABLE IF EXISTS $name")
}
  }
}
{code}
Maybe it is not the best option. If a test hit an exception in the 'f' closure, 
no matter what exception we hit in the finally block, we should not mask that 
exception with any other exception hit in the finally block. The exception 
caught in the finally block should be reattached to the original exception as a 
suppressed exception. The idea is similar to how java handle 
'try-with-resources' statement. 

 

A proposed to create following function to standardise those 'withXXX' 
functions. 
{code:java}
/**
 * Executes the given tryBlock and then the given finallyBlock no matter 
whether tryBlock throws
 * an exception. If both tryBlock and finallyBlock throw exceptions, the 
exception thrown
 * from the finallyBlock with be added to the exception thrown from tryBlock as 
a
 * suppress exception. It helps to avoid masking the exception from tryBlock 
with exception
 * from finallyBlock
 */
private def withFinallyBlock(tryBlock: => Unit)(finallyBlock: => Unit): Unit = {
  var fromTryBlock : Throwable = null
  try tryBlock catch {
case cause : Throwable =>
  fromTryBlock = cause
  throw cause
  } finally {
if (fromTryBlock != null) {
  try finallyBlock catch {
case fromFinallyBlock : Throwable =>
  fromTryBlock.addSuppressed(fromFinallyBlock)
  throw fromTryBlock
  }
} else {
  finallyBlock
}
  }
}
{code}
 

  was:
{code:java}
// code placeholder
{code}
The current `SQLTestUtils` created many `withXXX` utility functions to clean up 
tables/views/caches created for testing purpose. 

Some of those `withXXX` functions would ignore certain exceptions, like 
`NoSuchTableException` in the clean up block (finally block). 

```
 /**
 * Drops temporary view `viewNames` after calling `f`.
 */
 protected def withTempView(viewNames: String*)(f: => Unit): Unit = {
   try f finally
Unknown macro: \{    // If the test failed part way, we don't want to mask the 
failure by failing to remove    // temp views that never got created.    try 
viewNames.foreach(spark.catalog.dropTempView) catch {        case _: 
NoSuchTableException =>     }  }
}

```

Maybe it is not the best option. If a test hit an exception in the 'f' closure, 
no matter what exception we hit in the finally block, we should not mask that 
exception with any other exception hit in the finally block. The exception 
caught in the finally block should be reattached to the original exception as a 
suppressed exception. The idea is similar to how java handle 
'try-with-resources' statement. 

 

A proposed to create following function to standardise those 'withXXX' 
functions. 

```

/**
 * Executes the given tryBlock and then the given finallyBlock no matter 
whether tryBlock throws
 * an exception. If both tryBlock and finallyBlock throw exceptions, the 
exception thrown
 * from the finallyBlock with be added to the exception thrown from tryBlock as 
a
 * suppress exception. It helps to avoid masking the exception from tryBlock 
with exception
 * from finallyBlock
 */
 private def withFinallyBlock(tryBlock: => Unit)(finallyBlock: => Unit): Unit = 
{
   var fromTryBlock : Throwable = null
   try tryBlock catch \{     case cause : Throwable =>       fromTryBlock = 
cause       throw cause     }
finally {
       if (fromTryBlock != null)
Unknown macro: \{        try finallyBlock catch {           case 
fromFinallyBlock : Throwable =>             
fromTryBlock.addSuppressed(fromFinallyBlock)             throw fromTryBlock     
     }      }
else
{         finallyBlock       }
  }
 }

```

 


> SQLTestUtils Refactoring
> 
>
> Key: SPARK-27772
> URL: https://issues.apache.org/jira/browse/SPARK-27772
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: William Wong
>Priority: Minor
>
> The current `SQLTestUtils` created many `withXXX` utility functions to clean 
> up tables/views/caches created for testing purpose. 
> Some of those `withXXX` functions would ignore certain exceptions, like 
> `NoSuchTableException` in the clean up block (finally block). 
>  
> {code:java}
> /**
>  * Drops table `tableName` after calling `f`.
>  

[jira] [Updated] (SPARK-27772) SQLTestUtils Refactoring

2019-05-19 Thread William Wong (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

William Wong updated SPARK-27772:
-
Description: 
{code:java}
// code placeholder
{code}
The current `SQLTestUtils` created many `withXXX` utility functions to clean up 
tables/views/caches created for testing purpose. 

Some of those `withXXX` functions would ignore certain exceptions, like 
`NoSuchTableException` in the clean up block (finally block). 

```
 /**
 * Drops temporary view `viewNames` after calling `f`.
 */
 protected def withTempView(viewNames: String*)(f: => Unit): Unit = {
   try f finally
Unknown macro: \{    // If the test failed part way, we don't want to mask the 
failure by failing to remove    // temp views that never got created.    try 
viewNames.foreach(spark.catalog.dropTempView) catch {        case _: 
NoSuchTableException =>     }  }
}

```

Maybe it is not the best option. If a test hit an exception in the 'f' closure, 
no matter what exception we hit in the finally block, we should not mask that 
exception with any other exception hit in the finally block. The exception 
caught in the finally block should be reattached to the original exception as a 
suppressed exception. The idea is similar to how java handle 
'try-with-resources' statement. 

 

A proposed to create following function to standardise those 'withXXX' 
functions. 

```

/**
 * Executes the given tryBlock and then the given finallyBlock no matter 
whether tryBlock throws
 * an exception. If both tryBlock and finallyBlock throw exceptions, the 
exception thrown
 * from the finallyBlock with be added to the exception thrown from tryBlock as 
a
 * suppress exception. It helps to avoid masking the exception from tryBlock 
with exception
 * from finallyBlock
 */
 private def withFinallyBlock(tryBlock: => Unit)(finallyBlock: => Unit): Unit = 
{
   var fromTryBlock : Throwable = null
   try tryBlock catch \{     case cause : Throwable =>       fromTryBlock = 
cause       throw cause     }
finally {
       if (fromTryBlock != null)
Unknown macro: \{        try finallyBlock catch {           case 
fromFinallyBlock : Throwable =>             
fromTryBlock.addSuppressed(fromFinallyBlock)             throw fromTryBlock     
     }      }
else
{         finallyBlock       }
  }
 }

```

 

  was:
The current `SQLTestUtils` created many `withXXX` utility functions to clean up 
tables/views/caches created for testing purpose. 

Some of those `withXXX` functions would ignore certain exceptions, like 
`NoSuchTableException` in the clean up block (finally block). 

```
/**
 * Drops temporary view `viewNames` after calling `f`.
 */
protected def withTempView(viewNames: String*)(f: => Unit): Unit = {
  try f finally {
    // If the test failed part way, we don't want to mask the failure by 
failing to remove
    // temp views that never got created.
    try viewNames.foreach(spark.catalog.dropTempView) catch { 
      case _: NoSuchTableException =>
    }
  }
}

```

Maybe it is not the best option. If a test hit an exception in the 'f' closure, 
no matter what exception we hit in the finally block, we should not mask that 
exception with any other exception hit in the finally block. The exception 
caught in the finally block should be reattached to the original exception as a 
suppressed exception. The idea is similar to how java handle 
'try-with-resources' statement. 

 

A proposed to create following function to standardise those 'withXXX' 
functions. 

```

/**
 * Executes the given tryBlock and then the given finallyBlock no matter 
whether tryBlock throws
 * an exception. If both tryBlock and finallyBlock throw exceptions, the 
exception thrown
 * from the finallyBlock with be added to the exception thrown from tryBlock as 
a
 * suppress exception. It helps to avoid masking the exception from tryBlock 
with exception
 * from finallyBlock
 */
private def withFinallyBlock(tryBlock: => Unit)(finallyBlock: => Unit): Unit = {
  var fromTryBlock : Throwable = null
  try tryBlock catch {
    case cause : Throwable =>
      fromTryBlock = cause
      throw cause
    } finally {
      if (fromTryBlock != null) {
        try finallyBlock catch {
          case fromFinallyBlock : Throwable =>
            fromTryBlock.addSuppressed(fromFinallyBlock)
            throw fromTryBlock 
        }
      } else {
        finallyBlock 
     }
  }
}

```

 


> SQLTestUtils Refactoring
> 
>
> Key: SPARK-27772
> URL: https://issues.apache.org/jira/browse/SPARK-27772
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: William Wong
>Priority: Minor
>
> {code:java}
> // code placeholder
> {code}
> The current `SQLTestUtils` created many `withXXX` utility functions to clean 
> up tables/views/caches created for testing purpose. 

[jira] [Created] (SPARK-27772) SQLTestUtils Refactoring

2019-05-19 Thread William Wong (JIRA)
William Wong created SPARK-27772:


 Summary: SQLTestUtils Refactoring
 Key: SPARK-27772
 URL: https://issues.apache.org/jira/browse/SPARK-27772
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: William Wong


The current `SQLTestUtils` created many `withXXX` utility functions to clean up 
tables/views/caches created for testing purpose. 

Some of those `withXXX` functions would ignore certain exceptions, like 
`NoSuchTableException` in the clean up block (finally block). 

```
/**
 * Drops temporary view `viewNames` after calling `f`.
 */
protected def withTempView(viewNames: String*)(f: => Unit): Unit = {
  try f finally {
    // If the test failed part way, we don't want to mask the failure by 
failing to remove
    // temp views that never got created.
    try viewNames.foreach(spark.catalog.dropTempView) catch { 
      case _: NoSuchTableException =>
    }
  }
}

```

Maybe it is not the best option. If a test hit an exception in the 'f' closure, 
no matter what exception we hit in the finally block, we should not mask that 
exception with any other exception hit in the finally block. The exception 
caught in the finally block should be reattached to the original exception as a 
suppressed exception. The idea is similar to how java handle 
'try-with-resources' statement. 

 

A proposed to create following function to standardise those 'withXXX' 
functions. 

```

/**
 * Executes the given tryBlock and then the given finallyBlock no matter 
whether tryBlock throws
 * an exception. If both tryBlock and finallyBlock throw exceptions, the 
exception thrown
 * from the finallyBlock with be added to the exception thrown from tryBlock as 
a
 * suppress exception. It helps to avoid masking the exception from tryBlock 
with exception
 * from finallyBlock
 */
private def withFinallyBlock(tryBlock: => Unit)(finallyBlock: => Unit): Unit = {
  var fromTryBlock : Throwable = null
  try tryBlock catch {
    case cause : Throwable =>
      fromTryBlock = cause
      throw cause
    } finally {
      if (fromTryBlock != null) {
        try finallyBlock catch {
          case fromFinallyBlock : Throwable =>
            fromTryBlock.addSuppressed(fromFinallyBlock)
            throw fromTryBlock 
        }
      } else {
        finallyBlock 
     }
  }
}

```

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-27062) CatalogImpl.refreshTable should register query in cache with received tableName

2019-04-15 Thread William Wong (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

William Wong resolved SPARK-27062.
--
Resolution: Duplicate

> CatalogImpl.refreshTable should register query in cache with received 
> tableName
> ---
>
> Key: SPARK-27062
> URL: https://issues.apache.org/jira/browse/SPARK-27062
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.2
>Reporter: William Wong
>Priority: Minor
>  Labels: easyfix, pull-request-available
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> If _CatalogImpl.refreshTable()_ method is invoked against a cached table, 
> this method would first uncache corresponding query in the shared state cache 
> manager, and then cache it back to refresh the cache copy. 
> However, the table was recached with only 'table name'. The database name 
> will be missed. Therefore, if cached table is not on the default database, 
> the recreated cache may refer to a different table. For example, we may see 
> the cached table name in driver's storage page will be changed after table 
> refreshing. 
> Here is related code on github for your reference. 
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala]
>  
>  
> {code:java}
> override def refreshTable(tableName: String): Unit = {
>   val tableIdent = 
> sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName)
>   val tableMetadata = 
> sessionCatalog.getTempViewOrPermanentTableMetadata(tableIdent)
>   val table = sparkSession.table(tableIdent)
>   if (tableMetadata.tableType == CatalogTableType.VIEW) {
> // Temp or persistent views: refresh (or invalidate) any metadata/data 
> cached
> // in the plan recursively.
> table.queryExecution.analyzed.refresh()
>   } else {
> // Non-temp tables: refresh the metadata cache.
> sessionCatalog.refreshTable(tableIdent)
>   }
>   // If this table is cached as an InMemoryRelation, drop the original
>   // cached version and make the new version cached lazily.
>   if (isCached(table)) {
> // Uncache the logicalPlan.
> sparkSession.sharedState.cacheManager.uncacheQuery(table, cascade = true, 
> blocking = true)
> // Cache it again.
> sparkSession.sharedState.cacheManager.cacheQuery(table, 
> Some(tableIdent.table))
>   }
> }
> {code}
>  
>  CatalogImpl cache table with received _tableName_, instead of 
> _tableIdent.table_
> {code:java}
> override def cacheTable(tableName: String): Unit = {
> sparkSession.sharedState.cacheManager.cacheQuery(sparkSession.table(tableName),
>  Some(tableName)) }
> {code}
>  
> Therefore, I would like to propose aligning the behavior. RefreshTable method 
> should reuse the received _tableName_. Here is the proposed line of changes.
>  
> {code:java}
> sparkSession.sharedState.cacheManager.cacheQuery(table, 
> Some(tableIdent.table))
> {code}
> to 
> {code:java}
> sparkSession.sharedState.cacheManager.cacheQuery(table, Some(tableName)){code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27458) Remind developer using IntelliJ to update maven version

2019-04-15 Thread William Wong (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16818100#comment-16818100
 ] 

William Wong commented on SPARK-27458:
--

PR ([https://github.com/apache/spark-website/pull/195]) was created. 

> Remind developer using IntelliJ to update maven version
> ---
>
> Key: SPARK-27458
> URL: https://issues.apache.org/jira/browse/SPARK-27458
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.0.0
>Reporter: William Wong
>Priority: Major
>
> I am using IntelliJ to update a few spark source. I tried to follow the guide 
> at '[http://spark.apache.org/developer-tools.html]' to setup an IntelliJ 
> project for Spark. However, the project was failed to build. It was due to 
> missing classes generated via antlr on sql/catalyst project. I tried to click 
> the button 'Generate Sources and Update Folders for all Projects' but it does 
> not help. Antlr task was not triggered as expected.
> Checked the IntelliJ log file and found that it was because I did not set the 
> maven version properly and the 'Generate Sources and Update Folders for all 
> Projects' process was failed silently: 
>  
> _2019-04-14 16:05:24,796 [ 314609]   INFO -      #org.jetbrains.idea.maven - 
> [WARNING] Rule 0: org.apache.maven.plugins.enforcer.RequireMavenVersion 
> failed with message:_
> _Detected Maven Version: 3.3.9 is not in the allowed range 3.6.0._
> _2019-04-14 16:05:24,813 [ 314626]   INFO -      #org.jetbrains.idea.maven - 
> org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute 
> goal org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M2:enforce 
> (enforce-versions) on project spark-parent_2.12: Some Enforcer rules have 
> failed. Look above for specific messages explaining why the rule failed._
> _java.lang.RuntimeException: 
> org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute 
> goal org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M2:enforce 
> (enforce-versions) on project spark-parent_2.12: Some Enforcer rules have 
> failed. Look above for specific messages explaining why the rule failed._
>  
> Be honest, failing an action silently should be an IntelliJ bug. However, 
> enhancing the page  '[http://spark.apache.org/developer-tools.html]' to 
> remind developers to check the maven version may save those new joiners some 
> time. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27458) Remind developer using IntelliJ to update maven version

2019-04-14 Thread William Wong (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16817347#comment-16817347
 ] 

William Wong commented on SPARK-27458:
--

Is '[https://spark.apache.org/developer-tools.html'] a part of spark 
documentation? I cannot find it on sparks' repository (under the docs folder). 

> Remind developer using IntelliJ to update maven version
> ---
>
> Key: SPARK-27458
> URL: https://issues.apache.org/jira/browse/SPARK-27458
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.0.0
>Reporter: William Wong
>Priority: Major
>
> I am using IntelliJ to update a few spark source. I tried to follow the guide 
> at '[http://spark.apache.org/developer-tools.html]' to setup an IntelliJ 
> project for Spark. However, the project was failed to build. It was due to 
> missing classes generated via antlr on sql/catalyst project. I tried to click 
> the button 'Generate Sources and Update Folders for all Projects' but it does 
> not help. Antlr task was not triggered as expected.
> Checked the IntelliJ log file and found that it was because I did not set the 
> maven version properly and the 'Generate Sources and Update Folders for all 
> Projects' process was failed silently: 
>  
> _2019-04-14 16:05:24,796 [ 314609]   INFO -      #org.jetbrains.idea.maven - 
> [WARNING] Rule 0: org.apache.maven.plugins.enforcer.RequireMavenVersion 
> failed with message:_
> _Detected Maven Version: 3.3.9 is not in the allowed range 3.6.0._
> _2019-04-14 16:05:24,813 [ 314626]   INFO -      #org.jetbrains.idea.maven - 
> org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute 
> goal org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M2:enforce 
> (enforce-versions) on project spark-parent_2.12: Some Enforcer rules have 
> failed. Look above for specific messages explaining why the rule failed._
> _java.lang.RuntimeException: 
> org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute 
> goal org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M2:enforce 
> (enforce-versions) on project spark-parent_2.12: Some Enforcer rules have 
> failed. Look above for specific messages explaining why the rule failed._
>  
> Be honest, failing an action silently should be an IntelliJ bug. However, 
> enhancing the page  '[http://spark.apache.org/developer-tools.html]' to 
> remind developers to check the maven version may save those new joiners some 
> time. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27458) Remind developer using IntelliJ to update maven version

2019-04-14 Thread William Wong (JIRA)
William Wong created SPARK-27458:


 Summary: Remind developer using IntelliJ to update maven version
 Key: SPARK-27458
 URL: https://issues.apache.org/jira/browse/SPARK-27458
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 3.0.0
Reporter: William Wong


I am using IntelliJ to update a few spark source. I tried to follow the guide 
at '[http://spark.apache.org/developer-tools.html]' to setup an IntelliJ 
project for Spark. However, the project was failed to build. It was due to 
missing classes generated via antlr on sql/catalyst project. I tried to click 
the button 'Generate Sources and Update Folders for all Projects' but it does 
not help. Antlr task was not triggered as expected.

Checked the IntelliJ log file and found that it was because I did not set the 
maven version properly and the 'Generate Sources and Update Folders for all 
Projects' process was failed silently: 

 
_2019-04-14 16:05:24,796 [ 314609]   INFO -      #org.jetbrains.idea.maven - 
[WARNING] Rule 0: org.apache.maven.plugins.enforcer.RequireMavenVersion failed 
with message:_
_Detected Maven Version: 3.3.9 is not in the allowed range 3.6.0._
_2019-04-14 16:05:24,813 [ 314626]   INFO -      #org.jetbrains.idea.maven - 
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal 
org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M2:enforce 
(enforce-versions) on project spark-parent_2.12: Some Enforcer rules have 
failed. Look above for specific messages explaining why the rule failed._
_java.lang.RuntimeException: 
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal 
org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M2:enforce 
(enforce-versions) on project spark-parent_2.12: Some Enforcer rules have 
failed. Look above for specific messages explaining why the rule failed._
 
Be honest, failing an action silently should be an IntelliJ bug. However, 
enhancing the page  '[http://spark.apache.org/developer-tools.html]' to remind 
developers to check the maven version may save those new joiners some time. 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24130) Data Source V2: Join Push Down

2019-03-28 Thread William Wong (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780735#comment-16780735
 ] 

William Wong edited comment on SPARK-24130 at 3/28/19 7:31 AM:
---

Hi [~jliwork], Yes. It will be a very valuable enhancement. Appreciate if you 
can let us know the progress. Many thanks. 


was (Author: william1104):
Hi [~smilegator], Yes. It will be a very valuable enhancement. Appreciate if 
you can let us know the progress. Many thanks. 

> Data Source V2: Join Push Down
> --
>
> Key: SPARK-24130
> URL: https://issues.apache.org/jira/browse/SPARK-24130
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Jia Li
>Priority: Major
> Attachments: Data Source V2 Join Push Down.pdf
>
>
> Spark applications often directly query external data sources such as 
> relational databases, or files. Spark provides Data Sources APIs for 
> accessing structured data through Spark SQL. Data Sources APIs in both V1 and 
> V2 support optimizations such as Filter push down and Column pruning which 
> are subset of the functionality that can be pushed down to some data sources. 
> We’re proposing to extend Data Sources APIs with join push down (JPD). Join 
> push down significantly improves query performance by reducing the amount of 
> data transfer and exploiting the capabilities of the data sources such as 
> index access.
> Join push down design document is available 
> [here|https://docs.google.com/document/d/1k-kRadTcUbxVfUQwqBbIXs_yPZMxh18-e-cz77O_TaE/edit?usp=sharing].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27248) REFRESH TABLE should recreate cache with same cache name and storage level

2019-03-26 Thread William Wong (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801702#comment-16801702
 ] 

William Wong commented on SPARK-27248:
--

https://github.com/apache/spark/pull/24221

Hi @Hyukjin, just created a PR. Hope it is good enough. If not, please let me 
know and I will fix it. Many thanks. Best regards, William

> REFRESH TABLE should recreate cache with same cache name and storage level
> --
>
> Key: SPARK-27248
> URL: https://issues.apache.org/jira/browse/SPARK-27248
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: William Wong
>Priority: Major
>
> If we refresh a cached table, the table cache will be first uncached and then 
> recache (lazily). Currently, the logic is embedded in 
> CatalogImpl.refreshTable method.
> The current implementation does not preserve the cache name and storage 
> level. As a result, cache name and cache level could be changed after a 
> REFERSH. IMHO, it is not what a user would expect.
> I would like to fix this behavior by first save the cache name and storage 
> level for recaching the table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27248) REFRESH TABLE should recreate cache with same cache name and storage level

2019-03-22 Thread William Wong (JIRA)
William Wong created SPARK-27248:


 Summary: REFRESH TABLE should recreate cache with same cache name 
and storage level
 Key: SPARK-27248
 URL: https://issues.apache.org/jira/browse/SPARK-27248
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: William Wong


If we refresh a cached table, the table cache will be first uncached and then 
recache (lazily). Currently, the logic is embedded in CatalogImpl.refreshTable 
method.

The current implementation does not preserve the cache name and storage level. 
As a result, cache name and cache level could be changed after a REFERSH. IMHO, 
it is not what a user would expect.

I would like to fix this behavior by first save the cache name and storage 
level for recaching the table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27062) CatalogImpl.refreshTable should register query in cache with received tableName

2019-03-05 Thread William Wong (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

William Wong updated SPARK-27062:
-
Description: 
If _CatalogImpl.refreshTable()_ method is invoked against a cached table, this 
method would first uncache corresponding query in the shared state cache 
manager, and then cache it back to refresh the cache copy. 

However, the table was recached with only 'table name'. The database name will 
be missed. Therefore, if cached table is not on the default database, the 
recreated cache may refer to a different table. For example, we may see the 
cached table name in driver's storage page will be changed after table 
refreshing. 

Here is related code on github for your reference. 

[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala]
 

 
{code:java}
override def refreshTable(tableName: String): Unit = {
  val tableIdent = 
sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName)
  val tableMetadata = 
sessionCatalog.getTempViewOrPermanentTableMetadata(tableIdent)
  val table = sparkSession.table(tableIdent)

  if (tableMetadata.tableType == CatalogTableType.VIEW) {
// Temp or persistent views: refresh (or invalidate) any metadata/data 
cached
// in the plan recursively.
table.queryExecution.analyzed.refresh()
  } else {
// Non-temp tables: refresh the metadata cache.
sessionCatalog.refreshTable(tableIdent)
  }

  // If this table is cached as an InMemoryRelation, drop the original
  // cached version and make the new version cached lazily.
  if (isCached(table)) {
// Uncache the logicalPlan.
sparkSession.sharedState.cacheManager.uncacheQuery(table, cascade = true, 
blocking = true)
// Cache it again.
sparkSession.sharedState.cacheManager.cacheQuery(table, 
Some(tableIdent.table))
  }
}
{code}
 

 CatalogImpl cache table with received _tableName_, instead of 
_tableIdent.table_
{code:java}
override def cacheTable(tableName: String): Unit = {
sparkSession.sharedState.cacheManager.cacheQuery(sparkSession.table(tableName), 
Some(tableName)) }
{code}
 

Therefore, I would like to propose aligning the behavior. RefreshTable method 
should reuse the received _tableName_. Here is the proposed line of changes.

 
{code:java}
sparkSession.sharedState.cacheManager.cacheQuery(table, Some(tableIdent.table))
{code}
to 
{code:java}
sparkSession.sharedState.cacheManager.cacheQuery(table, Some(tableName)){code}
 

  was:
If CatalogImpl.refreshTable() method is invoked against a cached table, this 
method would first uncache corresponding query in the shared state cache 
manager, and then cache it back to refresh the cache copy. 

However, the table was recached with only 'table name'. The database name will 
be missed. Therefore, if cached table is not on the default database, the 
recreated cache may refer to a different table. For example, we may see the 
cached table name in driver's storage page will be changed after table 
refreshing. 

 

Here is related code on github for your reference. 

[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala]
 

 
{code:java}
override def refreshTable(tableName: String): Unit = {
  val tableIdent = 
sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName)
  val tableMetadata = 
sessionCatalog.getTempViewOrPermanentTableMetadata(tableIdent)
  val table = sparkSession.table(tableIdent)

  if (tableMetadata.tableType == CatalogTableType.VIEW) {
// Temp or persistent views: refresh (or invalidate) any metadata/data 
cached
// in the plan recursively.
table.queryExecution.analyzed.refresh()
  } else {
// Non-temp tables: refresh the metadata cache.
sessionCatalog.refreshTable(tableIdent)
  }

  // If this table is cached as an InMemoryRelation, drop the original
  // cached version and make the new version cached lazily.
  if (isCached(table)) {
// Uncache the logicalPlan.
sparkSession.sharedState.cacheManager.uncacheQuery(table, cascade = true, 
blocking = true)
// Cache it again.
sparkSession.sharedState.cacheManager.cacheQuery(table, 
Some(tableIdent.table))
  }
}
{code}
 

 Actually, CatalogImpl cache table with received table name, instead of only 
the table name. 
{code:java}
override def cacheTable(tableName: String): Unit = {
sparkSession.sharedState.cacheManager.cacheQuery(sparkSession.table(tableName), 
Some(tableName)) }
{code}
 

Therefore, I would like to propose aligning the behavior. RefreshTable method 
should reuse the received tableName. Here is the proposed changes.

 
{code:java}
sparkSession.sharedState.cacheManager.cacheQuery(table, Some(tableIdent.table))
{code}
to 
{code:java}
sparkSession.sharedState.cacheManager.cacheQuery(table, Some(tableName))
 {code}
 


> CatalogImpl.refreshTable should register query in cache with 

[jira] [Updated] (SPARK-27062) CatalogImpl.refreshTable should register query in cache with received tableName

2019-03-05 Thread William Wong (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

William Wong updated SPARK-27062:
-
Description: 
If CatalogImpl.refreshTable() method is invoked against a cached table, this 
method would first uncache corresponding query in the shared state cache 
manager, and then cache it back to refresh the cache copy. 

However, the table was recached with only 'table name'. The database name will 
be missed. Therefore, if cached table is not on the default database, the 
recreated cache may refer to a different table. For example, we may see the 
cached table name in driver's storage page will be changed after table 
refreshing. 

 

Here is related code on github for your reference. 

[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala]
 

 
{code:java}
override def refreshTable(tableName: String): Unit = {
  val tableIdent = 
sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName)
  val tableMetadata = 
sessionCatalog.getTempViewOrPermanentTableMetadata(tableIdent)
  val table = sparkSession.table(tableIdent)

  if (tableMetadata.tableType == CatalogTableType.VIEW) {
// Temp or persistent views: refresh (or invalidate) any metadata/data 
cached
// in the plan recursively.
table.queryExecution.analyzed.refresh()
  } else {
// Non-temp tables: refresh the metadata cache.
sessionCatalog.refreshTable(tableIdent)
  }

  // If this table is cached as an InMemoryRelation, drop the original
  // cached version and make the new version cached lazily.
  if (isCached(table)) {
// Uncache the logicalPlan.
sparkSession.sharedState.cacheManager.uncacheQuery(table, cascade = true, 
blocking = true)
// Cache it again.
sparkSession.sharedState.cacheManager.cacheQuery(table, 
Some(tableIdent.table))
  }
}
{code}
 

 Actually, CatalogImpl cache table with received table name, instead of only 
the table name. 
{code:java}
override def cacheTable(tableName: String): Unit = {
sparkSession.sharedState.cacheManager.cacheQuery(sparkSession.table(tableName), 
Some(tableName)) }
{code}
 

Therefore, I would like to propose aligning the behavior. RefreshTable method 
should reuse the received tableName. Here is the proposed changes.

 
{code:java}
sparkSession.sharedState.cacheManager.cacheQuery(table, Some(tableIdent.table))
{code}
to 
{code:java}
sparkSession.sharedState.cacheManager.cacheQuery(table, Some(tableName))
 {code}
 

  was:
If CatalogImpl.refreshTable() method is invoked against a cached table, this 
method would first uncache corresponding query in the shared state cache 
manager, and then cache it back to refresh the cache copy. 

However, the table was recached with only 'table name'. The database name will 
be missed. Therefore, if cached table is not on the default database, the 
recreated cache may refer to a different table. For example, we may see the 
cached table name in driver's storage page will be changed after table 
refreshing. 

 

Here is related code on github for your reference. 

[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala]
 

 
{code:java}
override def refreshTable(tableName: String): Unit = {
  val tableIdent = 
sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName)
  val tableMetadata = 
sessionCatalog.getTempViewOrPermanentTableMetadata(tableIdent)
  val table = sparkSession.table(tableIdent)

  if (tableMetadata.tableType == CatalogTableType.VIEW) {
// Temp or persistent views: refresh (or invalidate) any metadata/data 
cached
// in the plan recursively.
table.queryExecution.analyzed.refresh()
  } else {
// Non-temp tables: refresh the metadata cache.
sessionCatalog.refreshTable(tableIdent)
  }

  // If this table is cached as an InMemoryRelation, drop the original
  // cached version and make the new version cached lazily.
  if (isCached(table)) {
// Uncache the logicalPlan.
sparkSession.sharedState.cacheManager.uncacheQuery(table, cascade = true, 
blocking = true)
// Cache it again.
sparkSession.sharedState.cacheManager.cacheQuery(table, 
Some(tableIdent.table))
  }
}
{code}
 

 

In Spark SQL module, the database name is registered together with table name 
when "CACHE TABLE" command was executed. 

[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/command/cache.scala]
 

and  CatalogImpl register cache with received table name. 
{code:java}
override def cacheTable(tableName: String): Unit = {
sparkSession.sharedState.cacheManager.cacheQuery(sparkSession.table(tableName), 
Some(tableName)) }
{code}
 

Therefore, I would like to propose aligning the behavior. RefreshTable method 
should reuse the received table name instead. 

 
{code:java}
sparkSession.sharedState.cacheManager.cacheQuery(table, 

[jira] [Updated] (SPARK-27062) CatalogImpl.refreshTable should register query in cache with received tableName

2019-03-05 Thread William Wong (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

William Wong updated SPARK-27062:
-
Summary: CatalogImpl.refreshTable should register query in cache with 
received tableName  (was: Refresh Table command register table with table name 
only)

> CatalogImpl.refreshTable should register query in cache with received 
> tableName
> ---
>
> Key: SPARK-27062
> URL: https://issues.apache.org/jira/browse/SPARK-27062
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.2
>Reporter: William Wong
>Priority: Minor
>  Labels: easyfix, pull-request-available
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> If CatalogImpl.refreshTable() method is invoked against a cached table, this 
> method would first uncache corresponding query in the shared state cache 
> manager, and then cache it back to refresh the cache copy. 
> However, the table was recached with only 'table name'. The database name 
> will be missed. Therefore, if cached table is not on the default database, 
> the recreated cache may refer to a different table. For example, we may see 
> the cached table name in driver's storage page will be changed after table 
> refreshing. 
>  
> Here is related code on github for your reference. 
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala]
>  
>  
> {code:java}
> override def refreshTable(tableName: String): Unit = {
>   val tableIdent = 
> sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName)
>   val tableMetadata = 
> sessionCatalog.getTempViewOrPermanentTableMetadata(tableIdent)
>   val table = sparkSession.table(tableIdent)
>   if (tableMetadata.tableType == CatalogTableType.VIEW) {
> // Temp or persistent views: refresh (or invalidate) any metadata/data 
> cached
> // in the plan recursively.
> table.queryExecution.analyzed.refresh()
>   } else {
> // Non-temp tables: refresh the metadata cache.
> sessionCatalog.refreshTable(tableIdent)
>   }
>   // If this table is cached as an InMemoryRelation, drop the original
>   // cached version and make the new version cached lazily.
>   if (isCached(table)) {
> // Uncache the logicalPlan.
> sparkSession.sharedState.cacheManager.uncacheQuery(table, cascade = true, 
> blocking = true)
> // Cache it again.
> sparkSession.sharedState.cacheManager.cacheQuery(table, 
> Some(tableIdent.table))
>   }
> }
> {code}
>  
>  
> In Spark SQL module, the database name is registered together with table name 
> when "CACHE TABLE" command was executed. 
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/command/cache.scala]
>  
> and  CatalogImpl register cache with received table name. 
> {code:java}
> override def cacheTable(tableName: String): Unit = {
> sparkSession.sharedState.cacheManager.cacheQuery(sparkSession.table(tableName),
>  Some(tableName)) }
> {code}
>  
> Therefore, I would like to propose aligning the behavior. RefreshTable method 
> should reuse the received table name instead. 
>  
> {code:java}
> sparkSession.sharedState.cacheManager.cacheQuery(table, 
> Some(tableIdent.table))
> {code}
> to 
> {code:java}
> sparkSession.sharedState.cacheManager.cacheQuery(table, Some(tableName))
>  {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27062) Refresh Table command register table with table name only

2019-03-05 Thread William Wong (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

William Wong updated SPARK-27062:
-
Priority: Minor  (was: Major)

> Refresh Table command register table with table name only
> -
>
> Key: SPARK-27062
> URL: https://issues.apache.org/jira/browse/SPARK-27062
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.2
>Reporter: William Wong
>Priority: Minor
>  Labels: easyfix, pull-request-available
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> If CatalogImpl.refreshTable() method is invoked against a cached table, this 
> method would first uncache corresponding query in the shared state cache 
> manager, and then cache it back to refresh the cache copy. 
> However, the table was recached with only 'table name'. The database name 
> will be missed. Therefore, if cached table is not on the default database, 
> the recreated cache may refer to a different table. For example, we may see 
> the cached table name in driver's storage page will be changed after table 
> refreshing. 
>  
> Here is related code on github for your reference. 
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala]
>  
>  
> {code:java}
> override def refreshTable(tableName: String): Unit = {
>   val tableIdent = 
> sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName)
>   val tableMetadata = 
> sessionCatalog.getTempViewOrPermanentTableMetadata(tableIdent)
>   val table = sparkSession.table(tableIdent)
>   if (tableMetadata.tableType == CatalogTableType.VIEW) {
> // Temp or persistent views: refresh (or invalidate) any metadata/data 
> cached
> // in the plan recursively.
> table.queryExecution.analyzed.refresh()
>   } else {
> // Non-temp tables: refresh the metadata cache.
> sessionCatalog.refreshTable(tableIdent)
>   }
>   // If this table is cached as an InMemoryRelation, drop the original
>   // cached version and make the new version cached lazily.
>   if (isCached(table)) {
> // Uncache the logicalPlan.
> sparkSession.sharedState.cacheManager.uncacheQuery(table, cascade = true, 
> blocking = true)
> // Cache it again.
> sparkSession.sharedState.cacheManager.cacheQuery(table, 
> Some(tableIdent.table))
>   }
> }
> {code}
>  
>  
> In Spark SQL module, the database name is registered together with table name 
> when "CACHE TABLE" command was executed. 
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/command/cache.scala]
>  
> and  CatalogImpl register cache with received table name. 
> {code:java}
> override def cacheTable(tableName: String): Unit = {
> sparkSession.sharedState.cacheManager.cacheQuery(sparkSession.table(tableName),
>  Some(tableName)) }
> {code}
>  
> Therefore, I would like to propose aligning the behavior. RefreshTable method 
> should reuse the received table name instead. 
>  
> {code:java}
> sparkSession.sharedState.cacheManager.cacheQuery(table, 
> Some(tableIdent.table))
> {code}
> to 
> {code:java}
> sparkSession.sharedState.cacheManager.cacheQuery(table, Some(tableName))
>  {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27062) Refresh Table command register table with table name only

2019-03-05 Thread William Wong (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

William Wong updated SPARK-27062:
-
Labels: easyfix pull-request-available  (was: easyfix)

> Refresh Table command register table with table name only
> -
>
> Key: SPARK-27062
> URL: https://issues.apache.org/jira/browse/SPARK-27062
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.2
>Reporter: William Wong
>Priority: Major
>  Labels: easyfix, pull-request-available
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> If CatalogImpl.refreshTable() method is invoked against a cached table, this 
> method would first uncache corresponding query in the shared state cache 
> manager, and then cache it back to refresh the cache copy. 
> However, the table was recached with only 'table name'. The database name 
> will be missed. Therefore, if cached table is not on the default database, 
> the recreated cache may refer to a different table. For example, we may see 
> the cached table name in driver's storage page will be changed after table 
> refreshing. 
>  
> Here is related code on github for your reference. 
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala]
>  
>  
> {code:java}
> override def refreshTable(tableName: String): Unit = {
>   val tableIdent = 
> sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName)
>   val tableMetadata = 
> sessionCatalog.getTempViewOrPermanentTableMetadata(tableIdent)
>   val table = sparkSession.table(tableIdent)
>   if (tableMetadata.tableType == CatalogTableType.VIEW) {
> // Temp or persistent views: refresh (or invalidate) any metadata/data 
> cached
> // in the plan recursively.
> table.queryExecution.analyzed.refresh()
>   } else {
> // Non-temp tables: refresh the metadata cache.
> sessionCatalog.refreshTable(tableIdent)
>   }
>   // If this table is cached as an InMemoryRelation, drop the original
>   // cached version and make the new version cached lazily.
>   if (isCached(table)) {
> // Uncache the logicalPlan.
> sparkSession.sharedState.cacheManager.uncacheQuery(table, cascade = true, 
> blocking = true)
> // Cache it again.
> sparkSession.sharedState.cacheManager.cacheQuery(table, 
> Some(tableIdent.table))
>   }
> }
> {code}
>  
>  
> In Spark SQL module, the database name is registered together with table name 
> when "CACHE TABLE" command was executed. 
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/command/cache.scala]
>  
> and  CatalogImpl register cache with received table name. 
> {code:java}
> override def cacheTable(tableName: String): Unit = {
> sparkSession.sharedState.cacheManager.cacheQuery(sparkSession.table(tableName),
>  Some(tableName)) }
> {code}
>  
> Therefore, I would like to propose aligning the behavior. RefreshTable method 
> should reuse the received table name instead. 
>  
> {code:java}
> sparkSession.sharedState.cacheManager.cacheQuery(table, 
> Some(tableIdent.table))
> {code}
> to 
> {code:java}
> sparkSession.sharedState.cacheManager.cacheQuery(table, Some(tableName))
>  {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27062) Refresh Table command register table with table name only

2019-03-05 Thread William Wong (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

William Wong updated SPARK-27062:
-
Description: 
If CatalogImpl.refreshTable() method is invoked against a cached table, this 
method would first uncache corresponding query in the shared state cache 
manager, and then cache it back to refresh the cache copy. 

However, the table was recached with only 'table name'. The database name will 
be missed. Therefore, if cached table is not on the default database, the 
recreated cache may refer to a different table. For example, we may see the 
cached table name in driver's storage page will be changed after table 
refreshing. 

 

Here is related code on github for your reference. 

[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala]
 

 
{code:java}
override def refreshTable(tableName: String): Unit = {
  val tableIdent = 
sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName)
  val tableMetadata = 
sessionCatalog.getTempViewOrPermanentTableMetadata(tableIdent)
  val table = sparkSession.table(tableIdent)

  if (tableMetadata.tableType == CatalogTableType.VIEW) {
// Temp or persistent views: refresh (or invalidate) any metadata/data 
cached
// in the plan recursively.
table.queryExecution.analyzed.refresh()
  } else {
// Non-temp tables: refresh the metadata cache.
sessionCatalog.refreshTable(tableIdent)
  }

  // If this table is cached as an InMemoryRelation, drop the original
  // cached version and make the new version cached lazily.
  if (isCached(table)) {
// Uncache the logicalPlan.
sparkSession.sharedState.cacheManager.uncacheQuery(table, cascade = true, 
blocking = true)
// Cache it again.
sparkSession.sharedState.cacheManager.cacheQuery(table, 
Some(tableIdent.table))
  }
}
{code}
 

 

In Spark SQL module, the database name is registered together with table name 
when "CACHE TABLE" command was executed. 

[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/command/cache.scala]
 

and  CatalogImpl register cache with received table name. 
{code:java}
override def cacheTable(tableName: String): Unit = {
sparkSession.sharedState.cacheManager.cacheQuery(sparkSession.table(tableName), 
Some(tableName)) }
{code}
 

Therefore, I would like to propose aligning the behavior. RefreshTable method 
should reuse the received table name instead. 

 
{code:java}
sparkSession.sharedState.cacheManager.cacheQuery(table, Some(tableIdent.table))
{code}
to 
{code:java}
sparkSession.sharedState.cacheManager.cacheQuery(table, Some(tableName))
 {code}
 

  was:
If CatalogImpl.refreshTable() method is invoked against a cached table, this 
method would first uncache corresponding query in the shared state cache 
manager, and then cache it back to refresh the cache copy. 

However, the table was recached with only 'table name'. The database name will 
be missed. Therefore, if cached table is not on the default database, the 
recreated cache may refer to a different table. For example, we may see the 
cached table name in driver's storage page will be changed after table 
refreshing. 

 

Here is related code on github for your reference. 

[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala]
 

 
{code:java}
override def refreshTable(tableName: String): Unit = {
  val tableIdent = 
sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName)
  val tableMetadata = 
sessionCatalog.getTempViewOrPermanentTableMetadata(tableIdent)
  val table = sparkSession.table(tableIdent)

  if (tableMetadata.tableType == CatalogTableType.VIEW) {
// Temp or persistent views: refresh (or invalidate) any metadata/data 
cached
// in the plan recursively.
table.queryExecution.analyzed.refresh()
  } else {
// Non-temp tables: refresh the metadata cache.
sessionCatalog.refreshTable(tableIdent)
  }

  // If this table is cached as an InMemoryRelation, drop the original
  // cached version and make the new version cached lazily.
  if (isCached(table)) {
// Uncache the logicalPlan.
sparkSession.sharedState.cacheManager.uncacheQuery(table, cascade = true, 
blocking = true)
// Cache it again.
sparkSession.sharedState.cacheManager.cacheQuery(table, 
Some(tableIdent.table))
  }
}
{code}
 

 

In Spark SQL module, the database name is registered together with table name 
when "CACHE TABLE" command was executed. 

[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/command/cache.scala]
 

 

 

Therefore, I would like to propose aligning the behavior. Full table name 
should also be used in RefreshTable case.  We should change the following line 
in CatalogImpl.refreshTable from 

 
{code:java}

[jira] [Created] (SPARK-27062) Refresh Table command register table with table name only

2019-03-05 Thread William Wong (JIRA)
William Wong created SPARK-27062:


 Summary: Refresh Table command register table with table name only
 Key: SPARK-27062
 URL: https://issues.apache.org/jira/browse/SPARK-27062
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.3.2
Reporter: William Wong


If CatalogImpl.refreshTable() method is invoked against a cached table, this 
method would first uncache corresponding query in the shared state cache 
manager, and then cache it back to refresh the cache copy. 

However, the table was recached with only 'table name'. The database name will 
be missed. Therefore, if cached table is not on the default database, the 
recreated cache may refer to a different table. For example, we may see the 
cached table name in driver's storage page will be changed after table 
refreshing. 

 

Here is related code on github for your reference. 

[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala]
 

 
{code:java}
override def refreshTable(tableName: String): Unit = {
  val tableIdent = 
sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName)
  val tableMetadata = 
sessionCatalog.getTempViewOrPermanentTableMetadata(tableIdent)
  val table = sparkSession.table(tableIdent)

  if (tableMetadata.tableType == CatalogTableType.VIEW) {
// Temp or persistent views: refresh (or invalidate) any metadata/data 
cached
// in the plan recursively.
table.queryExecution.analyzed.refresh()
  } else {
// Non-temp tables: refresh the metadata cache.
sessionCatalog.refreshTable(tableIdent)
  }

  // If this table is cached as an InMemoryRelation, drop the original
  // cached version and make the new version cached lazily.
  if (isCached(table)) {
// Uncache the logicalPlan.
sparkSession.sharedState.cacheManager.uncacheQuery(table, cascade = true, 
blocking = true)
// Cache it again.
sparkSession.sharedState.cacheManager.cacheQuery(table, 
Some(tableIdent.table))
  }
}
{code}
 

 

In Spark SQL module, the database name is registered together with table name 
when "CACHE TABLE" command was executed. 

[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/command/cache.scala]
 

 

 

Therefore, I would like to propose aligning the behavior. Full table name 
should also be used in RefreshTable case.  We should change the following line 
in CatalogImpl.refreshTable from 

 
{code:java}
sparkSession.sharedState.cacheManager.cacheQuery(table, Some(tableIdent.table))
{code}
to

 

 
{code:java}
sparkSession.sharedState.cacheManager.cacheQuery(table, 
Some(tableIdent.quotedString))
 {code}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24130) Data Source V2: Join Push Down

2019-03-01 Thread William Wong (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16782237#comment-16782237
 ] 

William Wong commented on SPARK-24130:
--

https://github.com/apache/spark/pull/22547 
It seems that the PR was closed already. 

> Data Source V2: Join Push Down
> --
>
> Key: SPARK-24130
> URL: https://issues.apache.org/jira/browse/SPARK-24130
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Jia Li
>Priority: Major
> Attachments: Data Source V2 Join Push Down.pdf
>
>
> Spark applications often directly query external data sources such as 
> relational databases, or files. Spark provides Data Sources APIs for 
> accessing structured data through Spark SQL. Data Sources APIs in both V1 and 
> V2 support optimizations such as Filter push down and Column pruning which 
> are subset of the functionality that can be pushed down to some data sources. 
> We’re proposing to extend Data Sources APIs with join push down (JPD). Join 
> push down significantly improves query performance by reducing the amount of 
> data transfer and exploiting the capabilities of the data sources such as 
> index access.
> Join push down design document is available 
> [here|https://docs.google.com/document/d/1k-kRadTcUbxVfUQwqBbIXs_yPZMxh18-e-cz77O_TaE/edit?usp=sharing].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24130) Data Source V2: Join Push Down

2019-02-28 Thread William Wong (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780735#comment-16780735
 ] 

William Wong commented on SPARK-24130:
--

Hi [~smilegator], Yes. It will be a very valuable enhancement. Appreciate if 
you can let us know the progress. Many thanks. 

> Data Source V2: Join Push Down
> --
>
> Key: SPARK-24130
> URL: https://issues.apache.org/jira/browse/SPARK-24130
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Jia Li
>Priority: Major
> Attachments: Data Source V2 Join Push Down.pdf
>
>
> Spark applications often directly query external data sources such as 
> relational databases, or files. Spark provides Data Sources APIs for 
> accessing structured data through Spark SQL. Data Sources APIs in both V1 and 
> V2 support optimizations such as Filter push down and Column pruning which 
> are subset of the functionality that can be pushed down to some data sources. 
> We’re proposing to extend Data Sources APIs with join push down (JPD). Join 
> push down significantly improves query performance by reducing the amount of 
> data transfer and exploiting the capabilities of the data sources such as 
> index access.
> Join push down design document is available 
> [here|https://docs.google.com/document/d/1k-kRadTcUbxVfUQwqBbIXs_yPZMxh18-e-cz77O_TaE/edit?usp=sharing].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org