date:20191012

[jira] [Assigned] (SPARK-26321) Split a SQL in a correct way

2019-10-12 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-26321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang reassigned SPARK-26321:
---

Assignee: Yuming Wang

> Split a SQL in a correct way
> 
>
> Key: SPARK-26321
> URL: https://issues.apache.org/jira/browse/SPARK-26321
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Darcy Shen
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.0.0
>
>
> First:
> ./build/mvn -Phive-thriftserver -DskipTests package
>  
> Then:
> $ bin/spark-sql
>  
> 18/12/10 19:35:02 INFO SparkSQLCLIDriver: Time taken: 4.483 seconds, Fetched 
> 1 row(s)
>  spark-sql> select "1;2";
>  Error in query:
>  no viable alternative at input 'select "'(line 1, pos 7)
> == SQL ==
>  select "1
>  ---^^^



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-26321) Split a SQL in a correct way

2019-10-12 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-26321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-26321.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 25018
[https://github.com/apache/spark/pull/25018]

> Split a SQL in a correct way
> 
>
> Key: SPARK-26321
> URL: https://issues.apache.org/jira/browse/SPARK-26321
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Darcy Shen
>Priority: Major
> Fix For: 3.0.0
>
>
> First:
> ./build/mvn -Phive-thriftserver -DskipTests package
>  
> Then:
> $ bin/spark-sql
>  
> 18/12/10 19:35:02 INFO SparkSQLCLIDriver: Time taken: 4.483 seconds, Fetched 
> 1 row(s)
>  spark-sql> select "1;2";
>  Error in query:
>  no viable alternative at input 'select "'(line 1, pos 7)
> == SQL ==
>  select "1
>  ---^^^



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-29359) Better exception handling in SQLQueryTestSuite and ThriftServerQueryTestSuite

2019-10-12 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-29359.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26028
[https://github.com/apache/spark/pull/26028]

> Better exception handling in SQLQueryTestSuite and ThriftServerQueryTestSuite
> -
>
> Key: SPARK-29359
> URL: https://issues.apache.org/jira/browse/SPARK-29359
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Peter Toth
>Assignee: Peter Toth
>Priority: Minor
> Fix For: 3.0.0
>
>
> SQLQueryTestSuite and ThriftServerQueryTestSuite should have the same 
> exception handling to avoid issues like this:
> {noformat}
>   Expected "[Recursion level limit 100 reached but query has not exhausted, 
> try increasing spark.sql.cte.recursion.level.limit
>   org.apache.spark.SparkException]", but got "[org.apache.spark.SparkException
>   Recursion level limit 100 reached but query has not exhausted, try 
> increasing spark.sql.cte.recursion.level.limit]"
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-29359) Better exception handling in SQLQueryTestSuite and ThriftServerQueryTestSuite

2019-10-12 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang reassigned SPARK-29359:
---

Assignee: Peter Toth

> Better exception handling in SQLQueryTestSuite and ThriftServerQueryTestSuite
> -
>
> Key: SPARK-29359
> URL: https://issues.apache.org/jira/browse/SPARK-29359
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Peter Toth
>Assignee: Peter Toth
>Priority: Minor
>
> SQLQueryTestSuite and ThriftServerQueryTestSuite should have the same 
> exception handling to avoid issues like this:
> {noformat}
>   Expected "[Recursion level limit 100 reached but query has not exhausted, 
> try increasing spark.sql.cte.recursion.level.limit
>   org.apache.spark.SparkException]", but got "[org.apache.spark.SparkException
>   Recursion level limit 100 reached but query has not exhausted, try 
> increasing spark.sql.cte.recursion.level.limit]"
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29435) Spark 3 doesnt work with older shuffle service

2019-10-12 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16950212#comment-16950212
 ] 

Sandeep Katta commented on SPARK-29435:
---

cc [~cloud_fan] [~XuanYuan] this patched is tested by [~koert] please help to 
review patch

> Spark 3 doesnt work with older shuffle service
> --
>
> Key: SPARK-29435
> URL: https://issues.apache.org/jira/browse/SPARK-29435
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.0.0
> Environment: Spark 3 from Sept 26, commit 
> 8beb736a00b004f97de7fcdf9ff09388d80fc548
> Spark 2.4.1 shuffle service in yarn 
>Reporter: koert kuipers
>Priority: Major
>
> SPARK-27665 introduced a change to the shuffle protocol. It also introduced a 
> setting spark.shuffle.useOldFetchProtocol which would allow spark 3 to run 
> with old shuffle service.
> However i have not gotten that to work. I have been testing with Spark 3 
> master (from Sept 26) and shuffle service from Spark 2.4.1 in yarn.
> The errors i see are for example on EMR:
> {code}
> Error occurred while fetching local blocks
> java.nio.file.NoSuchFileException: 
> /mnt1/yarn/usercache/hadoop/appcache/application_1570697024032_0058/blockmgr-d1d009b1-1c95-4e2a-9a71-0ff20078b9a8/38/shuffle_0_0_0.index
> {code}
> And on CDH5:
> {code}
> org.apache.spark.shuffle.FetchFailedException: 
> /data/9/hadoop/nm/usercache/koert/appcache/application_1568061697664_8250/blockmgr-57f28014-cdf2-431e-8e11-447ba5c2b2f2/0b/shuffle_0_0_0.index
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:596)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:511)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:67)
>   at 
> org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:29)
>   at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
>   at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
>   at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
>   at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>   at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
>   at scala.collection.Iterator$SliceIterator.hasNext(Iterator.scala:266)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:337)
>   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:850)
>   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:850)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:327)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:291)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:127)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:455)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:458)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.nio.file.NoSuchFileException: 
> /data/9/hadoop/nm/usercache/koert/appcache/application_1568061697664_8250/blockmgr-57f28014-cdf2-431e-8e11-447ba5c2b2f2/0b/shuffle_0_0_0.index
>   at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>   at 
> sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
>   at java.nio.file.Files.newByteChannel(Files.java:361)
>   at java.nio.file.Files.newByteChannel(Files.java:407)
>   at 
> org.apache.spark.shuffle.IndexShuffleBlockResolver.getBlockData(IndexShuffleBlockResolver.scala:204)
>   at 
> org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:551)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.fetchLocalBlocks(ShuffleBlockFetcherIterator.scala:349)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.initialize(ShuffleBlockFetcherIterator.scala:391)
>   at 
>

[jira] [Resolved] (SPARK-29368) Port interval.sql

2019-10-12 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-29368.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26055
[https://github.com/apache/spark/pull/26055]

> Port interval.sql
> -
>
> Key: SPARK-29368
> URL: https://issues.apache.org/jira/browse/SPARK-29368
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.0.0
>
>
> Here is interval.sql: 
> [https://raw.githubusercontent.com/postgres/postgres/REL_12_STABLE/src/test/regress/sql/interval.sql]
> Results: 
> https://github.com/postgres/postgres/blob/REL_12_STABLE/src/test/regress/expected/interval.out



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-29368) Port interval.sql

2019-10-12 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-29368:
-

Assignee: Maxim Gekk

> Port interval.sql
> -
>
> Key: SPARK-29368
> URL: https://issues.apache.org/jira/browse/SPARK-29368
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
>
> Here is interval.sql: 
> [https://raw.githubusercontent.com/postgres/postgres/REL_12_STABLE/src/test/regress/sql/interval.sql]
> Results: 
> https://github.com/postgres/postgres/blob/REL_12_STABLE/src/test/regress/expected/interval.out



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-29435) Spark 3 doesnt work with older shuffle service

2019-10-12 Thread koert kuipers (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16950122#comment-16950122
 ] 

koert kuipers edited comment on SPARK-29435 at 10/12/19 11:50 PM:
--

i checked the patch and it works with dynamic execution using spark 3 shuffle 
service and using spark 2 shuffle service


was (Author: koert):
i checked and it works with dynamic execution using spark 3 shuffle service and 
using spark 2 shuffle service

> Spark 3 doesnt work with older shuffle service
> --
>
> Key: SPARK-29435
> URL: https://issues.apache.org/jira/browse/SPARK-29435
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.0.0
> Environment: Spark 3 from Sept 26, commit 
> 8beb736a00b004f97de7fcdf9ff09388d80fc548
> Spark 2.4.1 shuffle service in yarn 
>Reporter: koert kuipers
>Priority: Major
>
> SPARK-27665 introduced a change to the shuffle protocol. It also introduced a 
> setting spark.shuffle.useOldFetchProtocol which would allow spark 3 to run 
> with old shuffle service.
> However i have not gotten that to work. I have been testing with Spark 3 
> master (from Sept 26) and shuffle service from Spark 2.4.1 in yarn.
> The errors i see are for example on EMR:
> {code}
> Error occurred while fetching local blocks
> java.nio.file.NoSuchFileException: 
> /mnt1/yarn/usercache/hadoop/appcache/application_1570697024032_0058/blockmgr-d1d009b1-1c95-4e2a-9a71-0ff20078b9a8/38/shuffle_0_0_0.index
> {code}
> And on CDH5:
> {code}
> org.apache.spark.shuffle.FetchFailedException: 
> /data/9/hadoop/nm/usercache/koert/appcache/application_1568061697664_8250/blockmgr-57f28014-cdf2-431e-8e11-447ba5c2b2f2/0b/shuffle_0_0_0.index
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:596)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:511)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:67)
>   at 
> org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:29)
>   at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
>   at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
>   at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
>   at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>   at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
>   at scala.collection.Iterator$SliceIterator.hasNext(Iterator.scala:266)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:337)
>   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:850)
>   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:850)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:327)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:291)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:127)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:455)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:458)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.nio.file.NoSuchFileException: 
> /data/9/hadoop/nm/usercache/koert/appcache/application_1568061697664_8250/blockmgr-57f28014-cdf2-431e-8e11-447ba5c2b2f2/0b/shuffle_0_0_0.index
>   at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>   at 
> sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
>   at java.nio.file.Files.newByteChannel(Files.java:361)
>   at java.nio.file.Files.newByteChannel(Files.java:407)
>   at 
> org.apache.spark.shuffle.IndexShuffleBlockResolver.getBlockData(IndexShuffleBlockResolver.scala:204)
>   at 
> org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:551)
>   at 
>

[jira] [Commented] (SPARK-29435) Spark 3 doesnt work with older shuffle service

2019-10-12 Thread koert kuipers (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16950122#comment-16950122
 ] 

koert kuipers commented on SPARK-29435:
---

i checked and it works with dynamic execution using spark 3 shuffle service and 
using spark 2 shuffle service

> Spark 3 doesnt work with older shuffle service
> --
>
> Key: SPARK-29435
> URL: https://issues.apache.org/jira/browse/SPARK-29435
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.0.0
> Environment: Spark 3 from Sept 26, commit 
> 8beb736a00b004f97de7fcdf9ff09388d80fc548
> Spark 2.4.1 shuffle service in yarn 
>Reporter: koert kuipers
>Priority: Major
>
> SPARK-27665 introduced a change to the shuffle protocol. It also introduced a 
> setting spark.shuffle.useOldFetchProtocol which would allow spark 3 to run 
> with old shuffle service.
> However i have not gotten that to work. I have been testing with Spark 3 
> master (from Sept 26) and shuffle service from Spark 2.4.1 in yarn.
> The errors i see are for example on EMR:
> {code}
> Error occurred while fetching local blocks
> java.nio.file.NoSuchFileException: 
> /mnt1/yarn/usercache/hadoop/appcache/application_1570697024032_0058/blockmgr-d1d009b1-1c95-4e2a-9a71-0ff20078b9a8/38/shuffle_0_0_0.index
> {code}
> And on CDH5:
> {code}
> org.apache.spark.shuffle.FetchFailedException: 
> /data/9/hadoop/nm/usercache/koert/appcache/application_1568061697664_8250/blockmgr-57f28014-cdf2-431e-8e11-447ba5c2b2f2/0b/shuffle_0_0_0.index
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:596)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:511)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:67)
>   at 
> org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:29)
>   at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
>   at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
>   at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
>   at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>   at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
>   at scala.collection.Iterator$SliceIterator.hasNext(Iterator.scala:266)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:337)
>   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:850)
>   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:850)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:327)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:291)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:127)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:455)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:458)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.nio.file.NoSuchFileException: 
> /data/9/hadoop/nm/usercache/koert/appcache/application_1568061697664_8250/blockmgr-57f28014-cdf2-431e-8e11-447ba5c2b2f2/0b/shuffle_0_0_0.index
>   at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>   at 
> sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
>   at java.nio.file.Files.newByteChannel(Files.java:361)
>   at java.nio.file.Files.newByteChannel(Files.java:407)
>   at 
> org.apache.spark.shuffle.IndexShuffleBlockResolver.getBlockData(IndexShuffleBlockResolver.scala:204)
>   at 
> org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:551)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.fetchLocalBlocks(ShuffleBlockFetcherIterator.scala:349)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.initialize(ShuffleBlockFetcherIterator.scala:391)
>   at 
>

[jira] [Resolved] (SPARK-29446) Upgrade netty-all to 4.1.42 and fix vulnerabilities.

2019-10-12 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-29446.
-
Resolution: Duplicate

> Upgrade netty-all to 4.1.42 and fix vulnerabilities.
> 
>
> Key: SPARK-29446
> URL: https://issues.apache.org/jira/browse/SPARK-29446
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: jiaan.geng
>Priority: Major
>
> The current code uses io.netty:netty-all:jar:4.1.17 and it will cause a 
> security vulnerabilities. We could get some security info from 
> [https://www.tenable.com/cve/CVE-2019-16869].
> This reference remind to upgrate the version of netty-all to 4.1.42 or later.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29449) Add tooltip to Spark WebUI

2019-10-12 Thread Xiao Li (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-29449:

Description: 
The initial effort was made in 
https://issues.apache.org/jira/browse/SPARK-2384. This umbrella Jira is to 
track the progress of adding tooltip to all the WebUI for better usability.

 

  was:This umbrella Jira is to track the progress of adding tooltip to all the 
WebUI for better usability. 


> Add tooltip to Spark WebUI
> --
>
> Key: SPARK-29449
> URL: https://issues.apache.org/jira/browse/SPARK-29449
> Project: Spark
>  Issue Type: Umbrella
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Priority: Major
>
> The initial effort was made in 
> https://issues.apache.org/jira/browse/SPARK-2384. This umbrella Jira is to 
> track the progress of adding tooltip to all the WebUI for better usability.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29019) Improve tooltip information in JDBC/ODBC Server tab

2019-10-12 Thread Xiao Li (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-29019:

Parent: SPARK-29449
Issue Type: Sub-task  (was: Improvement)

> Improve tooltip information in JDBC/ODBC Server tab
> ---
>
> Key: SPARK-29019
> URL: https://issues.apache.org/jira/browse/SPARK-29019
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: Pablo Langa Blanco
>Assignee: Pablo Langa Blanco
>Priority: Trivial
> Fix For: 3.0.0
>
>
> Some of the columns of JDBC/ODBC server tab in Web UI are hard to understand.
> We have documented it at SPARK-28373 but I think it is better to have some 
> tooltips in the SQL statistics table to explain the columns
> More information at the pull request



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29323) Add tooltip for The Executors Tab's column names in the Spark history server Page

2019-10-12 Thread Xiao Li (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-29323:

Parent: SPARK-29449
Issue Type: Sub-task  (was: Improvement)

> Add tooltip for The Executors Tab's column names in the Spark history server 
> Page
> -
>
> Key: SPARK-29323
> URL: https://issues.apache.org/jira/browse/SPARK-29323
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: liucht-inspur
>Assignee: liucht-inspur
>Priority: Trivial
> Fix For: 3.0.0
>
> Attachments: image-2019-10-04-09-42-14-174.png
>
>
> the spark Executors of history Tab page, the Summary part shows the line in 
> the list of title, but format is irregular.
> Some column names have tooltip, such as Storage Memory, Task Time(GC Time), 
> Input, Shuffle Read,Shuffle Write and Blacklisted, but there are still some 
> list names that have not tooltip. They are RDD Blocks, Disk Used,Cores, 
> Activity Tasks, Failed Tasks , Complete Tasks and Total Tasks. oddly, 
> Executors section below,All the column names Contains the column names above 
> have tooltip .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29449) Add tooltip to Spark WebUI

2019-10-12 Thread Xiao Li (Jira)

Xiao Li created SPARK-29449:
---

 Summary: Add tooltip to Spark WebUI
 Key: SPARK-29449
 URL: https://issues.apache.org/jira/browse/SPARK-29449
 Project: Spark
  Issue Type: Umbrella
  Components: Web UI
Affects Versions: 3.0.0
Reporter: Xiao Li


This umbrella Jira is to track the progress of adding tooltip to all the WebUI 
for better usability. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-29433) Web UI Stages table tooltip correction

2019-10-12 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-29433.
--
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26084
[https://github.com/apache/spark/pull/26084]

> Web UI Stages table tooltip correction
> --
>
> Key: SPARK-29433
> URL: https://issues.apache.org/jira/browse/SPARK-29433
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: Pablo Langa Blanco
>Assignee: Pablo Langa Blanco
>Priority: Trivial
> Fix For: 3.0.0
>
>
> In the Web UI, Stages table, the tool tip of Input and output column are not 
> corrrect.
> Actual tooltip messages: 
>  * Bytes and records read from Hadoop or from Spark storage.
>  * Bytes and records written to Hadoop.
> In this column we are only showing bytes, not records
> More information at the pull request



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-29433) Web UI Stages table tooltip correction

2019-10-12 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-29433:


Assignee: Pablo Langa Blanco

> Web UI Stages table tooltip correction
> --
>
> Key: SPARK-29433
> URL: https://issues.apache.org/jira/browse/SPARK-29433
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: Pablo Langa Blanco
>Assignee: Pablo Langa Blanco
>Priority: Trivial
>
> In the Web UI, Stages table, the tool tip of Input and output column are not 
> corrrect.
> Actual tooltip messages: 
>  * Bytes and records read from Hadoop or from Spark storage.
>  * Bytes and records written to Hadoop.
> In this column we are only showing bytes, not records
> More information at the pull request



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-29445) Bump netty-all from 4.1.39.Final to 4.1.42.Final

2019-10-12 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-29445:


Assignee: Fokko Driesprong

> Bump netty-all from 4.1.39.Final to 4.1.42.Final
> 
>
> Key: SPARK-29445
> URL: https://issues.apache.org/jira/browse/SPARK-29445
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 2.4.4
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Major
>
> https://www.cvedetails.com/cve/CVE-2019-16869/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-29445) Bump netty-all from 4.1.39.Final to 4.1.42.Final

2019-10-12 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-29445.
--
Fix Version/s: 3.0.0
   2.4.5
   Resolution: Fixed

Issue resolved by pull request 26099
[https://github.com/apache/spark/pull/26099]

> Bump netty-all from 4.1.39.Final to 4.1.42.Final
> 
>
> Key: SPARK-29445
> URL: https://issues.apache.org/jira/browse/SPARK-29445
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 2.4.4
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Major
> Fix For: 2.4.5, 3.0.0
>
>
> https://www.cvedetails.com/cve/CVE-2019-16869/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-29323) Add tooltip for The Executors Tab's column names in the Spark history server Page

2019-10-12 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-29323:


Assignee: liucht-inspur

> Add tooltip for The Executors Tab's column names in the Spark history server 
> Page
> -
>
> Key: SPARK-29323
> URL: https://issues.apache.org/jira/browse/SPARK-29323
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: liucht-inspur
>Assignee: liucht-inspur
>Priority: Trivial
> Attachments: image-2019-10-04-09-42-14-174.png
>
>
> the spark Executors of history Tab page, the Summary part shows the line in 
> the list of title, but format is irregular.
> Some column names have tooltip, such as Storage Memory, Task Time(GC Time), 
> Input, Shuffle Read,Shuffle Write and Blacklisted, but there are still some 
> list names that have not tooltip. They are RDD Blocks, Disk Used,Cores, 
> Activity Tasks, Failed Tasks , Complete Tasks and Total Tasks. oddly, 
> Executors section below,All the column names Contains the column names above 
> have tooltip .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-29323) Add tooltip for The Executors Tab's column names in the Spark history server Page

2019-10-12 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-29323.
--
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 25994
[https://github.com/apache/spark/pull/25994]

> Add tooltip for The Executors Tab's column names in the Spark history server 
> Page
> -
>
> Key: SPARK-29323
> URL: https://issues.apache.org/jira/browse/SPARK-29323
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: liucht-inspur
>Assignee: liucht-inspur
>Priority: Trivial
> Fix For: 3.0.0
>
> Attachments: image-2019-10-04-09-42-14-174.png
>
>
> the spark Executors of history Tab page, the Summary part shows the line in 
> the list of title, but format is irregular.
> Some column names have tooltip, such as Storage Memory, Task Time(GC Time), 
> Input, Shuffle Read,Shuffle Write and Blacklisted, but there are still some 
> list names that have not tooltip. They are RDD Blocks, Disk Used,Cores, 
> Activity Tasks, Failed Tasks , Complete Tasks and Total Tasks. oddly, 
> Executors section below,All the column names Contains the column names above 
> have tooltip .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-29404) Add some explanations about the color of execution bars on Web UI.

2019-10-12 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-29404.
--
Resolution: Won't Fix

> Add some explanations about the color of execution bars on Web UI.
> --
>
> Key: SPARK-29404
> URL: https://issues.apache.org/jira/browse/SPARK-29404
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.0.0
>Reporter: Tomoko Komiyama
>Priority: Trivial
>
> Documentation doesn't explain enough about why the executor bar's color 
> changes with click on it on Web UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29328) Incorrect calculation mean seconds per month

2019-10-12 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-29328:
-
Labels:   (was: correctness)

> Incorrect calculation mean seconds per month
> 
>
> Key: SPARK-29328
> URL: https://issues.apache.org/jira/browse/SPARK-29328
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.3, 2.2.3, 2.3.4, 2.4.4
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Minor
>
> Existing implementation assumes 31 days per month or 372 days per year which 
> is far away from the correct number. Spark uses the proleptic Gregorian 
> calendar by default SPARK-26651 in which the average year is 365.2425 days 
> long: https://en.wikipedia.org/wiki/Gregorian_calendar . Need to fix 
> calculation in 3 places at least:
> - GroupStateImpl.scala:167:val millisPerMonth = 
> TimeUnit.MICROSECONDS.toMillis(CalendarInterval.MICROS_PER_DAY) * 31
> - EventTimeWatermark.scala:32:val millisPerMonth = 
> TimeUnit.MICROSECONDS.toMillis(CalendarInterval.MICROS_PER_DAY) * 31



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-29328) Incorrect calculation mean seconds per month

2019-10-12 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-29328.
--
Fix Version/s: (was: 3.0.0)
   Resolution: Won't Fix

> Incorrect calculation mean seconds per month
> 
>
> Key: SPARK-29328
> URL: https://issues.apache.org/jira/browse/SPARK-29328
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.3, 2.2.3, 2.3.4, 2.4.4
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Minor
>  Labels: correctness
>
> Existing implementation assumes 31 days per month or 372 days per year which 
> is far away from the correct number. Spark uses the proleptic Gregorian 
> calendar by default SPARK-26651 in which the average year is 365.2425 days 
> long: https://en.wikipedia.org/wiki/Gregorian_calendar . Need to fix 
> calculation in 3 places at least:
> - GroupStateImpl.scala:167:val millisPerMonth = 
> TimeUnit.MICROSECONDS.toMillis(CalendarInterval.MICROS_PER_DAY) * 31
> - EventTimeWatermark.scala:32:val millisPerMonth = 
> TimeUnit.MICROSECONDS.toMillis(CalendarInterval.MICROS_PER_DAY) * 31



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-29328) Incorrect calculation mean seconds per month

2019-10-12 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reopened SPARK-29328:
--

Reverted in https://github.com/apache/spark/pull/26101

> Incorrect calculation mean seconds per month
> 
>
> Key: SPARK-29328
> URL: https://issues.apache.org/jira/browse/SPARK-29328
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.3, 2.2.3, 2.3.4, 2.4.4
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Minor
>  Labels: correctness
> Fix For: 3.0.0
>
>
> Existing implementation assumes 31 days per month or 372 days per year which 
> is far away from the correct number. Spark uses the proleptic Gregorian 
> calendar by default SPARK-26651 in which the average year is 365.2425 days 
> long: https://en.wikipedia.org/wiki/Gregorian_calendar . Need to fix 
> calculation in 3 places at least:
> - GroupStateImpl.scala:167:val millisPerMonth = 
> TimeUnit.MICROSECONDS.toMillis(CalendarInterval.MICROS_PER_DAY) * 31
> - EventTimeWatermark.scala:32:val millisPerMonth = 
> TimeUnit.MICROSECONDS.toMillis(CalendarInterval.MICROS_PER_DAY) * 31



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29410) Update Commons BeanUtils to 1.9.4

2019-10-12 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-29410:
-
Fix Version/s: 2.4.5

> Update Commons BeanUtils to 1.9.4
> -
>
> Key: SPARK-29410
> URL: https://issues.apache.org/jira/browse/SPARK-29410
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Peter Toth
>Assignee: Peter Toth
>Priority: Minor
> Fix For: 2.4.5, 3.0.0
>
>
> Update commons-beanutils to 1.9.4 to fix CVE: 
> [http://commons.apache.org/proper/commons-beanutils/javadocs/v1.9.4/RELEASE-NOTES.txt]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-29380) RFormula avoid repeated 'first' jobs to get vector size

2019-10-12 Thread zhengruifeng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng reassigned SPARK-29380:


Assignee: zhengruifeng

> RFormula avoid repeated 'first' jobs to get vector size
> ---
>
> Key: SPARK-29380
> URL: https://issues.apache.org/jira/browse/SPARK-29380
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.0.0
>Reporter: zhengruifeng
>Assignee: zhengruifeng
>Priority: Minor
>
> In current impl, {{RFormula}} will trigger one {{first}} job to get the 
> vector size, if the size can not be obtained from {{AttributeGroup.}}
> {{This can be optimized by get the first row lazily, and reuse it for each 
> vector column.}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-29380) RFormula avoid repeated 'first' jobs to get vector size

2019-10-12 Thread zhengruifeng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng resolved SPARK-29380.
--
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26052
[https://github.com/apache/spark/pull/26052]

> RFormula avoid repeated 'first' jobs to get vector size
> ---
>
> Key: SPARK-29380
> URL: https://issues.apache.org/jira/browse/SPARK-29380
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.0.0
>Reporter: zhengruifeng
>Assignee: zhengruifeng
>Priority: Minor
> Fix For: 3.0.0
>
>
> In current impl, {{RFormula}} will trigger one {{first}} job to get the 
> vector size, if the size can not be obtained from {{AttributeGroup.}}
> {{This can be optimized by get the first row lazily, and reuse it for each 
> vector column.}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-29410) Update Commons BeanUtils to 1.9.4

2019-10-12 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-29410:


Assignee: Peter Toth

> Update Commons BeanUtils to 1.9.4
> -
>
> Key: SPARK-29410
> URL: https://issues.apache.org/jira/browse/SPARK-29410
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Peter Toth
>Assignee: Peter Toth
>Priority: Minor
>
> Update commons-beanutils to 1.9.4 to fix CVE: 
> [http://commons.apache.org/proper/commons-beanutils/javadocs/v1.9.4/RELEASE-NOTES.txt]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-29410) Update Commons BeanUtils to 1.9.4

2019-10-12 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-29410.
--
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26069
[https://github.com/apache/spark/pull/26069]

> Update Commons BeanUtils to 1.9.4
> -
>
> Key: SPARK-29410
> URL: https://issues.apache.org/jira/browse/SPARK-29410
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Peter Toth
>Assignee: Peter Toth
>Priority: Minor
> Fix For: 3.0.0
>
>
> Update commons-beanutils to 1.9.4 to fix CVE: 
> [http://commons.apache.org/proper/commons-beanutils/javadocs/v1.9.4/RELEASE-NOTES.txt]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28810) Document SHOW TABLES in SQL Reference.

2019-10-12 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-28810.
--
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 25561
[https://github.com/apache/spark/pull/25561]

> Document SHOW TABLES in SQL Reference.
> --
>
> Key: SPARK-28810
> URL: https://issues.apache.org/jira/browse/SPARK-28810
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, SQL
>Affects Versions: 2.4.3
>Reporter: Dilip Biswal
>Assignee: Shivu Sondur
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-28810) Document SHOW TABLES in SQL Reference.

2019-10-12 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-28810:


Assignee: Shivu Sondur

> Document SHOW TABLES in SQL Reference.
> --
>
> Key: SPARK-28810
> URL: https://issues.apache.org/jira/browse/SPARK-28810
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, SQL
>Affects Versions: 2.4.3
>Reporter: Dilip Biswal
>Assignee: Shivu Sondur
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-29116) Refactor py classes related to DecisionTree

2019-10-12 Thread zhengruifeng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng resolved SPARK-29116.
--
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 25929
[https://github.com/apache/spark/pull/25929]

> Refactor py classes related to DecisionTree
> ---
>
> Key: SPARK-29116
> URL: https://issues.apache.org/jira/browse/SPARK-29116
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 3.0.0
>Reporter: zhengruifeng
>Assignee: Huaxin Gao
>Priority: Minor
> Fix For: 3.0.0
>
>
> 1, Like the scala side, move related classes to a seperate file 'tree.py'
> 2, add method 'predictLeaf' in 'DecisionTreeModel' & 'TreeEnsembleModel'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-29116) Refactor py classes related to DecisionTree

2019-10-12 Thread zhengruifeng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng reassigned SPARK-29116:


Assignee: Huaxin Gao

> Refactor py classes related to DecisionTree
> ---
>
> Key: SPARK-29116
> URL: https://issues.apache.org/jira/browse/SPARK-29116
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 3.0.0
>Reporter: zhengruifeng
>Assignee: Huaxin Gao
>Priority: Minor
>
> 1, Like the scala side, move related classes to a seperate file 'tree.py'
> 2, add method 'predictLeaf' in 'DecisionTreeModel' & 'TreeEnsembleModel'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29448) Support the `INTERVAL` type by Parquet datasource

2019-10-12 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-29448:
--

 Summary: Support the `INTERVAL` type by Parquet datasource
 Key: SPARK-29448
 URL: https://issues.apache.org/jira/browse/SPARK-29448
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 2.4.4
Reporter: Maxim Gekk


Parquet format allows to store intervals as triple of (milliseconds, days, 
months) see 
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#interval . 
The `INTERVAL` logical type is used for an interval of time. _It must annotate 
a fixed_len_byte_array of length 12. This array stores three little-endian 
unsigned integers that represent durations at different granularities of time. 
The first stores a number in months, the second stores a number in days, and 
the third stores a number in milliseconds. This representation is independent 
of any particular timezone or date._

Need to support writing and reading values of Catalyst's CalendarIntervalType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-29447) Allow users to update the name of a column

2019-10-12 Thread Lantao Jin (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lantao Jin closed SPARK-29447.
--

> Allow users to update the name of a column
> --
>
> Key: SPARK-29447
> URL: https://issues.apache.org/jira/browse/SPARK-29447
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Lantao Jin
>Priority: Major
>
> Right now, {{ALTER TABLE CHANGE COLUMN}} command only supports changing the 
> comment of a non-partition column for now 
> (https://issues.apache.org/jira/browse/SPARK-17910).
> Support changing column's name is very useful. I will file a PR to address it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-29447) Allow users to update the name of a column

2019-10-12 Thread Lantao Jin (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lantao Jin resolved SPARK-29447.

Resolution: Invalid

Close since we have had {{ALTER TABLE table1 RENAME COLUMN a.b.c TO x}}

> Allow users to update the name of a column
> --
>
> Key: SPARK-29447
> URL: https://issues.apache.org/jira/browse/SPARK-29447
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Lantao Jin
>Priority: Major
>
> Right now, {{ALTER TABLE CHANGE COLUMN}} command only supports changing the 
> comment of a non-partition column for now 
> (https://issues.apache.org/jira/browse/SPARK-17910).
> Support changing column's name is very useful. I will file a PR to address it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29428) Can't persist/set None-valued param

2019-10-12 Thread Borys Biletskyy (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949968#comment-16949968
 ] 

Borys Biletskyy commented on SPARK-29428:
-

Thanks for your inputs. Maybe makes sense to mention it in the python docs, 
where I really missed conventions regarding None params. For me it was a way to 
workaround https://issues.apache.org/jira/browse/SPARK-29414.

>From the following Params method I've got an impression that None params are 
>acceptable.
{code:java}
def _set(self, **kwargs):
"""
Sets user-supplied params.
"""
for param, value in kwargs.items():
p = getattr(self, param)
if value is not None:
try:
value = p.typeConverter(value)
except TypeError as e:
raise TypeError('Invalid param value given for param "%s". %s' 
% (p.name, e))
self._paramMap[p] = value
return self
{code}
Which is not the case here, where None params are not acceptable.
{code:java}
def set(self, param, value):
"""
Sets a parameter in the embedded param map.
"""
self._shouldOwn(param)
try:
value = param.typeConverter(value)
except ValueError as e:
raise ValueError('Invalid param value given for param "%s". %s' % 
(param.name, e))
self._paramMap[param] = value
{code}
 

 

> Can't persist/set None-valued param 
> 
>
> Key: SPARK-29428
> URL: https://issues.apache.org/jira/browse/SPARK-29428
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 2.3.2
>Reporter: Borys Biletskyy
>Priority: Major
>
> {code:java}
> import pytest
> from pyspark import keyword_only
> from pyspark.ml import Model
> from pyspark.sql import DataFrame
> from pyspark.ml.util import DefaultParamsReadable, DefaultParamsWritable
> from pyspark.ml.param.shared import HasInputCol
> from pyspark.sql.functions import *
> class NoneParamTester(Model,
>   HasInputCol,
>   DefaultParamsReadable,
>   DefaultParamsWritable
>   ):
> @keyword_only
> def __init__(self, inputCol: str = None):
> super(NoneParamTester, self).__init__()
> kwargs = self._input_kwargs
> self.setParams(**kwargs)
> @keyword_only
> def setParams(self, inputCol: str = None):
> kwargs = self._input_kwargs
> self._set(**kwargs)
> return self
> def _transform(self, data: DataFrame) -> DataFrame:
> return data
> class TestNoneParam(object):
> def test_persist_none(self, spark, temp_dir):
> path = temp_dir + '/test_model'
> model = NoneParamTester(inputCol=None)
> assert model.isDefined(model.inputCol)
> assert model.isSet(model.inputCol)
> assert model.getInputCol() is None
> model.write().overwrite().save(path)
> NoneParamTester.load(path)  # TypeError: Could not convert  'NoneType'> to string type
> def test_set_none(self, spark):
> model = NoneParamTester(inputCol=None)
> assert model.isDefined(model.inputCol)
> assert model.isSet(model.inputCol)
> assert model.getInputCol() is None
> model.set(model.inputCol, None)  # TypeError: Could not convert 
>  to string type
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29447) Allow users to update the name of a column

2019-10-12 Thread Lantao Jin (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lantao Jin updated SPARK-29447:
---
Description: 
Right now, {{ALTER TABLE CHANGE COLUMN}} command only supports changing the 
comment of a non-partition column for now 
(https://issues.apache.org/jira/browse/SPARK-17910).

Support changing column's name is very useful. I will file a PR to address it.

  was:
Right now, {{ALTER TABLE CHANGE COLUMN}} command only supports changing the 
comment of a non-partition column for now 
(https://issues.apache.org/jira/browse/SPARK-17910).

Support changing column's name is very useful.


> Allow users to update the name of a column
> --
>
> Key: SPARK-29447
> URL: https://issues.apache.org/jira/browse/SPARK-29447
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Lantao Jin
>Priority: Major
>
> Right now, {{ALTER TABLE CHANGE COLUMN}} command only supports changing the 
> comment of a non-partition column for now 
> (https://issues.apache.org/jira/browse/SPARK-17910).
> Support changing column's name is very useful. I will file a PR to address it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29447) Allow users to update the name of a column

2019-10-12 Thread Lantao Jin (Jira)

Lantao Jin created SPARK-29447:
--

 Summary: Allow users to update the name of a column
 Key: SPARK-29447
 URL: https://issues.apache.org/jira/browse/SPARK-29447
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.0.0
Reporter: Lantao Jin


Right now, {{ALTER TABLE CHANGE COLUMN}} command only supports changing the 
comment of a non-partition column for now 
(https://issues.apache.org/jira/browse/SPARK-17910).

Support changing column's name is very useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29441) Unable to Alter table column type in spark.

2019-10-12 Thread Lantao Jin (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949961#comment-16949961
 ] 

Lantao Jin commented on SPARK-29441:


ALTER TABLE CHANGE COLUMN command only supports changing the comment of a 
non-partition column for now.

> Unable to Alter table column type in spark.
> ---
>
> Key: SPARK-29441
> URL: https://issues.apache.org/jira/browse/SPARK-29441
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 2.3.1
> Environment: spark -2.3
> hadoop -2.4
>Reporter: prudhviraj
>Priority: Major
>
> Unable to alter table column type in spark.
> scala> spark.sql("""alter table tablename change col1 col1 string""")
> org.apache.spark.sql.AnalysisException: ALTER TABLE CHANGE COLUMN is not 
> supported for changing column 'col1' with type 'LongType' to 'col1' with type 
> 'StringType';



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29446) Upgrade netty-all to 4.1.42 and fix vulnerabilities.

2019-10-12 Thread Iskender Unlu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949958#comment-16949958
 ] 

Iskender Unlu commented on SPARK-29446:
---

I will have a look around this issue.

> Upgrade netty-all to 4.1.42 and fix vulnerabilities.
> 
>
> Key: SPARK-29446
> URL: https://issues.apache.org/jira/browse/SPARK-29446
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: jiaan.geng
>Priority: Major
>
> The current code uses io.netty:netty-all:jar:4.1.17 and it will cause a 
> security vulnerabilities. We could get some security info from 
> [https://www.tenable.com/cve/CVE-2019-16869].
> This reference remind to upgrate the version of netty-all to 4.1.42 or later.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29446) Upgrade netty-all to 4.1.42 and fix vulnerabilities.

2019-10-12 Thread jiaan.geng (Jira)

jiaan.geng created SPARK-29446:
--

 Summary: Upgrade netty-all to 4.1.42 and fix vulnerabilities.
 Key: SPARK-29446
 URL: https://issues.apache.org/jira/browse/SPARK-29446
 Project: Spark
  Issue Type: Dependency upgrade
  Components: Build
Affects Versions: 3.0.0
Reporter: jiaan.geng


The current code uses io.netty:netty-all:jar:4.1.17 and it will cause a 
security vulnerabilities. We could get some security info from 
[https://www.tenable.com/cve/CVE-2019-16869].
This reference remind to upgrate the version of netty-all to 4.1.42 or later.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

42 matches

Mail list logo