date:20191018

[jira] [Commented] (SPARK-29503) MapObjects doesn't copy Unsafe data when nested under Safe data

2019-10-18 Thread Jungtaek Lim (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16955090#comment-16955090
 ] 

Jungtaek Lim commented on SPARK-29503:
--

Thanks for reporting the issue in super detailed information! I've submitted a 
PR based on your observation. Please take a look.

> MapObjects doesn't copy Unsafe data when nested under Safe data
> ---
>
> Key: SPARK-29503
> URL: https://issues.apache.org/jira/browse/SPARK-29503
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 3.0.0
>Reporter: Aaron Lewis
>Priority: Major
>  Labels: correctness
>
> In order for MapObjects to operate safely, it checks to see if the result of 
> the mapping function is an Unsafe type (UnsafeRow, UnsafeArrayData, 
> UnsafeMapData) and performs a copy before writing it into MapObjects' output 
> array. This is to protect against expressions which re-use the same native 
> memory buffer to represent its result across evaluations; if the copy wasn't 
> here, all results would be pointing to the same native buffer and would 
> represent the last result written to the buffer. However, MapObjects misses 
> this needed copy if the Unsafe data is nested below some safe structure, for 
> instance a GenericArrrayData whose elements are all UnsafeRows. In this 
> scenario, all elements of the GenericArrayData will be pointing to the same 
> native UnsafeRow buffer which will hold the last value written to it.
>  
> Right now, this bug seems to only occur when a `ProjectExec` goes down the 
> `execute` path, as opposed to WholeStageCodegen's `produce` and `consume` 
> path.
>  
> Example Reproduction Code:
> {code:scala}
> import org.apache.spark.sql.catalyst.expressions.objects.MapObjects
> import org.apache.spark.sql.catalyst.expressions.CreateArray
> import org.apache.spark.sql.catalyst.expressions.Expression
> import org.apache.spark.sql.functions.{array, struct}
> import org.apache.spark.sql.Column
> import org.apache.spark.sql.types.ArrayType
> // For the purpose of demonstration, we need to disable WholeStage codegen
> spark.conf.set("spark.sql.codegen.wholeStage", "false")
> val exampleDS = spark.sparkContext.parallelize(Seq(Seq(1, 2, 
> 3))).toDF("items")
> // Trivial example: Nest unsafe struct inside safe array
> // items: Seq[Int] => items.map{item => Seq(Struct(item))}
> val result = exampleDS.select(
> new Column(MapObjects(
> {item: Expression => array(struct(new Column(item))).expr},
> $"items".expr,
> exampleDS.schema("items").dataType.asInstanceOf[ArrayType].elementType
> )) as "items"
> )
> result.show(10, false)
> {code}
>  
> Actual Output:
> {code:java}
> +-+
> |items|
> +-+
> |[WrappedArray([3]), WrappedArray([3]), WrappedArray([3])]|
> +-+
> {code}
>  
> Expected Output:
> {code:java}
> +-+
> |items|
> +-+
> |[WrappedArray([1]), WrappedArray([2]), WrappedArray([3])]|
> +-+
> {code}
>  
> We've confirmed that the bug exists on version 2.1.1 as well as on master 
> (which I assume corresponds to version 3.0.0?)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29517) TRUNCATE TABLE should look up catalog/table like v2 commands

2019-10-18 Thread L. C. Hsieh (Jira)

L. C. Hsieh created SPARK-29517:
---

 Summary: TRUNCATE TABLE should look up catalog/table like v2 
commands
 Key: SPARK-29517
 URL: https://issues.apache.org/jira/browse/SPARK-29517
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: L. C. Hsieh


TRUNCATE TABLE should look up catalog/table like v2 commands




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-29517) TRUNCATE TABLE should look up catalog/table like v2 commands

2019-10-18 Thread L. C. Hsieh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh reassigned SPARK-29517:
---

Assignee: L. C. Hsieh

> TRUNCATE TABLE should look up catalog/table like v2 commands
> 
>
> Key: SPARK-29517
> URL: https://issues.apache.org/jira/browse/SPARK-29517
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>
> TRUNCATE TABLE should look up catalog/table like v2 commands



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-29512) REPAIR TABLE should look up catalog/table like v2 commands

2019-10-18 Thread L. C. Hsieh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh resolved SPARK-29512.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26168
[https://github.com/apache/spark/pull/26168]

> REPAIR TABLE should look up catalog/table like v2 commands
> --
>
> Key: SPARK-29512
> URL: https://issues.apache.org/jira/browse/SPARK-29512
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Major
> Fix For: 3.0.0
>
>
> REPAIR TABLE should look up catalog/table like v2 commands



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-29512) REPAIR TABLE should look up catalog/table like v2 commands

2019-10-18 Thread L. C. Hsieh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh reassigned SPARK-29512:
---

Assignee: Terry Kim

> REPAIR TABLE should look up catalog/table like v2 commands
> --
>
> Key: SPARK-29512
> URL: https://issues.apache.org/jira/browse/SPARK-29512
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Major
>
> REPAIR TABLE should look up catalog/table like v2 commands



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29516) Test ThriftServerQueryTestSuite asynchronously

2019-10-18 Thread Yuming Wang (Jira)

Yuming Wang created SPARK-29516:
---

 Summary: Test ThriftServerQueryTestSuite asynchronously
 Key: SPARK-29516
 URL: https://issues.apache.org/jira/browse/SPARK-29516
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yuming Wang


spark.sql.hive.thriftServer.async



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29106) Add jenkins arm test for spark

2019-10-18 Thread zhao bo (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16955061#comment-16955061
 ] 

zhao bo commented on SPARK-29106:
-

Thanks [~shaneknapp].

That's great that you will copy a full jenkins configuration test code to us. 
;) We are very happy that the test result for the fist periodic arm test job. 
For us, building  a powerful ARM testing architecture is still a very hard work 
to be done, and from our team, we are plan to integrate more and higher 
performance ARM VMs into community for supporting the PullRequest Trigger type 
testing jobs, also more works to improve exec testing for matching the 
PullRequest Trigger requirement are waiting for us..

Now let's see the test result.

> Add jenkins arm test for spark
> --
>
> Key: SPARK-29106
> URL: https://issues.apache.org/jira/browse/SPARK-29106
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: huangtianhua
>Priority: Minor
>
> Add arm test jobs to amplab jenkins for spark.
> Till now we made two arm test periodic jobs for spark in OpenLab, one is 
> based on master with hadoop 2.7(similar with QA test of amplab jenkins), 
> other one is based on a new branch which we made on date 09-09, see  
> [http://status.openlabtesting.org/builds/job/spark-master-unit-test-hadoop-2.7-arm64]
>   and 
> [http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64.|http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64]
>  We only have to care about the first one when integrate arm test with amplab 
> jenkins.
> About the k8s test on arm, we have took test it, see 
> [https://github.com/theopenlab/spark/pull/17], maybe we can integrate it 
> later. 
> And we plan test on other stable branches too, and we can integrate them to 
> amplab when they are ready.
> We have offered an arm instance and sent the infos to shane knapp, thanks 
> shane to add the first arm job to amplab jenkins :) 
> The other important thing is about the leveldbjni 
> [https://github.com/fusesource/leveldbjni,|https://github.com/fusesource/leveldbjni/issues/80]
>  spark depends on leveldbjni-all-1.8 
> [https://mvnrepository.com/artifact/org.fusesource.leveldbjni/leveldbjni-all/1.8],
>  we can see there is no arm64 supporting. So we build an arm64 supporting 
> release of leveldbjni see 
> [https://mvnrepository.com/artifact/org.openlabtesting.leveldbjni/leveldbjni-all/1.8],
>  but we can't modified the spark pom.xml directly with something like 
> 'property'/'profile' to choose correct jar package on arm or x86 platform, 
> because spark depends on some hadoop packages like hadoop-hdfs, the packages 
> depend on leveldbjni-all-1.8 too, unless hadoop release with new arm 
> supporting leveldbjni jar. Now we download the leveldbjni-al-1.8 of 
> openlabtesting and 'mvn install' to use it when arm testing for spark.
> PS: The issues found and fixed:
>  SPARK-28770
>  [https://github.com/apache/spark/pull/25673]
>   
>  SPARK-28519
>  [https://github.com/apache/spark/pull/25279]
>   
>  SPARK-28433
>  [https://github.com/apache/spark/pull/25186]
>  
> SPARK-28467
> [https://github.com/apache/spark/pull/25864]
>  
> SPARK-29286
> [https://github.com/apache/spark/pull/26021]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29494) ArrayOutOfBoundsException when converting from string to timestamp

2019-10-18 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-29494:
-
Fix Version/s: (was: 2.4.5)

> ArrayOutOfBoundsException when converting from string to timestamp
> --
>
> Key: SPARK-29494
> URL: https://issues.apache.org/jira/browse/SPARK-29494
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Rahul Shivu Mahadev
>Assignee: Rahul Shivu Mahadev
>Priority: Minor
> Fix For: 3.0.0
>
>
> In a couple of scenarios while converting from String to Timestamp `
> DateTimeUtils.stringToTimestamp` throws an array out of bounds exception if 
> there is trailing spaces or ':'. The behavior of this method requires it to 
> return `None` in case the format of the string is incorrect.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-29502) typed interval expression should fail for invalid format

2019-10-18 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-29502.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26151
[https://github.com/apache/spark/pull/26151]

> typed interval expression should fail for invalid format
> 
>
> Key: SPARK-29502
> URL: https://issues.apache.org/jira/browse/SPARK-29502
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-29466) Show `Duration` for running drivers in Standalone master web UI

2019-10-18 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-29466.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26113
[https://github.com/apache/spark/pull/26113]

> Show `Duration` for running drivers in Standalone master web UI
> ---
>
> Key: SPARK-29466
> URL: https://issues.apache.org/jira/browse/SPARK-29466
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 3.0.0
>
>
> This issue aims to add a new column for `Duration` for running drivers table 
> in `Standalone` master web UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-29466) Show `Duration` for running drivers in Standalone master web UI

2019-10-18 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-29466:
-

Assignee: Dongjoon Hyun

> Show `Duration` for running drivers in Standalone master web UI
> ---
>
> Key: SPARK-29466
> URL: https://issues.apache.org/jira/browse/SPARK-29466
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>
> This issue aims to add a new column for `Duration` for running drivers table 
> in `Standalone` master web UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-29494) ArrayOutOfBoundsException when converting from string to timestamp

2019-10-18 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-29494:


Assignee: Rahul Shivu Mahadev

> ArrayOutOfBoundsException when converting from string to timestamp
> --
>
> Key: SPARK-29494
> URL: https://issues.apache.org/jira/browse/SPARK-29494
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Rahul Shivu Mahadev
>Assignee: Rahul Shivu Mahadev
>Priority: Minor
>
> In a couple of scenarios while converting from String to Timestamp `
> DateTimeUtils.stringToTimestamp` throws an array out of bounds exception if 
> there is trailing spaces or ':'. The behavior of this method requires it to 
> return `None` in case the format of the string is incorrect.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-29494) ArrayOutOfBoundsException when converting from string to timestamp

2019-10-18 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-29494.
--
Fix Version/s: 3.0.0
   2.4.5
   Resolution: Fixed

Issue resolved by pull request 26143
[https://github.com/apache/spark/pull/26143]

> ArrayOutOfBoundsException when converting from string to timestamp
> --
>
> Key: SPARK-29494
> URL: https://issues.apache.org/jira/browse/SPARK-29494
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Rahul Shivu Mahadev
>Assignee: Rahul Shivu Mahadev
>Priority: Minor
> Fix For: 2.4.5, 3.0.0
>
>
> In a couple of scenarios while converting from String to Timestamp `
> DateTimeUtils.stringToTimestamp` throws an array out of bounds exception if 
> there is trailing spaces or ':'. The behavior of this method requires it to 
> return `None` in case the format of the string is incorrect.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29494) ArrayOutOfBoundsException when converting from string to timestamp

2019-10-18 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-29494:
-
Priority: Minor  (was: Major)

> ArrayOutOfBoundsException when converting from string to timestamp
> --
>
> Key: SPARK-29494
> URL: https://issues.apache.org/jira/browse/SPARK-29494
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Rahul Shivu Mahadev
>Priority: Minor
>
> In a couple of scenarios while converting from String to Timestamp `
> DateTimeUtils.stringToTimestamp` throws an array out of bounds exception if 
> there is trailing spaces or ':'. The behavior of this method requires it to 
> return `None` in case the format of the string is incorrect.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-29515) MapStatuses SerDeser Benchmark

2019-10-18 Thread DB Tsai (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DB Tsai reassigned SPARK-29515:
---

Assignee: DB Tsai

> MapStatuses SerDeser Benchmark
> --
>
> Key: SPARK-29515
> URL: https://issues.apache.org/jira/browse/SPARK-29515
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 2.4.4
>Reporter: DB Tsai
>Assignee: DB Tsai
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-29515) MapStatuses SerDeser Benchmark

2019-10-18 Thread DB Tsai (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DB Tsai resolved SPARK-29515.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26169
[https://github.com/apache/spark/pull/26169]

> MapStatuses SerDeser Benchmark
> --
>
> Key: SPARK-29515
> URL: https://issues.apache.org/jira/browse/SPARK-29515
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 2.4.4
>Reporter: DB Tsai
>Assignee: DB Tsai
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29515) MapStatuses SerDeser Benchmark

2019-10-18 Thread DB Tsai (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DB Tsai updated SPARK-29515:

Affects Version/s: (was: 3.0.0)
   2.4.4

> MapStatuses SerDeser Benchmark
> --
>
> Key: SPARK-29515
> URL: https://issues.apache.org/jira/browse/SPARK-29515
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 2.4.4
>Reporter: DB Tsai
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29515) MapStatuses SerDeser Benchmark

2019-10-18 Thread DB Tsai (Jira)

DB Tsai created SPARK-29515:
---

 Summary: MapStatuses SerDeser Benchmark
 Key: SPARK-29515
 URL: https://issues.apache.org/jira/browse/SPARK-29515
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Affects Versions: 3.0.0
Reporter: DB Tsai






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29106) Add jenkins arm test for spark

2019-10-18 Thread Shane Knapp (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954912#comment-16954912
 ] 

Shane Knapp commented on SPARK-29106:
-

also, i will be exploring the purchase of an ARM server for our cluster.  the 
VM is just not going to be enough for our purposes.  this won't happen 
immediately, so we'll use the VM until then.

> Add jenkins arm test for spark
> --
>
> Key: SPARK-29106
> URL: https://issues.apache.org/jira/browse/SPARK-29106
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: huangtianhua
>Priority: Minor
>
> Add arm test jobs to amplab jenkins for spark.
> Till now we made two arm test periodic jobs for spark in OpenLab, one is 
> based on master with hadoop 2.7(similar with QA test of amplab jenkins), 
> other one is based on a new branch which we made on date 09-09, see  
> [http://status.openlabtesting.org/builds/job/spark-master-unit-test-hadoop-2.7-arm64]
>   and 
> [http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64.|http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64]
>  We only have to care about the first one when integrate arm test with amplab 
> jenkins.
> About the k8s test on arm, we have took test it, see 
> [https://github.com/theopenlab/spark/pull/17], maybe we can integrate it 
> later. 
> And we plan test on other stable branches too, and we can integrate them to 
> amplab when they are ready.
> We have offered an arm instance and sent the infos to shane knapp, thanks 
> shane to add the first arm job to amplab jenkins :) 
> The other important thing is about the leveldbjni 
> [https://github.com/fusesource/leveldbjni,|https://github.com/fusesource/leveldbjni/issues/80]
>  spark depends on leveldbjni-all-1.8 
> [https://mvnrepository.com/artifact/org.fusesource.leveldbjni/leveldbjni-all/1.8],
>  we can see there is no arm64 supporting. So we build an arm64 supporting 
> release of leveldbjni see 
> [https://mvnrepository.com/artifact/org.openlabtesting.leveldbjni/leveldbjni-all/1.8],
>  but we can't modified the spark pom.xml directly with something like 
> 'property'/'profile' to choose correct jar package on arm or x86 platform, 
> because spark depends on some hadoop packages like hadoop-hdfs, the packages 
> depend on leveldbjni-all-1.8 too, unless hadoop release with new arm 
> supporting leveldbjni jar. Now we download the leveldbjni-al-1.8 of 
> openlabtesting and 'mvn install' to use it when arm testing for spark.
> PS: The issues found and fixed:
>  SPARK-28770
>  [https://github.com/apache/spark/pull/25673]
>   
>  SPARK-28519
>  [https://github.com/apache/spark/pull/25279]
>   
>  SPARK-28433
>  [https://github.com/apache/spark/pull/25186]
>  
> SPARK-28467
> [https://github.com/apache/spark/pull/25864]
>  
> SPARK-29286
> [https://github.com/apache/spark/pull/26021]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29106) Add jenkins arm test for spark

2019-10-18 Thread Shane Knapp (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954904#comment-16954904
 ] 

Shane Knapp commented on SPARK-29106:
-

i'm actually not going to use the script – the testing code will be in the 
jenkins job config:

[https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-arm/]

 

once i get the build config sorted and working as expected i'll be sure to give 
you all a copy.  :)

> Add jenkins arm test for spark
> --
>
> Key: SPARK-29106
> URL: https://issues.apache.org/jira/browse/SPARK-29106
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: huangtianhua
>Priority: Minor
>
> Add arm test jobs to amplab jenkins for spark.
> Till now we made two arm test periodic jobs for spark in OpenLab, one is 
> based on master with hadoop 2.7(similar with QA test of amplab jenkins), 
> other one is based on a new branch which we made on date 09-09, see  
> [http://status.openlabtesting.org/builds/job/spark-master-unit-test-hadoop-2.7-arm64]
>   and 
> [http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64.|http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64]
>  We only have to care about the first one when integrate arm test with amplab 
> jenkins.
> About the k8s test on arm, we have took test it, see 
> [https://github.com/theopenlab/spark/pull/17], maybe we can integrate it 
> later. 
> And we plan test on other stable branches too, and we can integrate them to 
> amplab when they are ready.
> We have offered an arm instance and sent the infos to shane knapp, thanks 
> shane to add the first arm job to amplab jenkins :) 
> The other important thing is about the leveldbjni 
> [https://github.com/fusesource/leveldbjni,|https://github.com/fusesource/leveldbjni/issues/80]
>  spark depends on leveldbjni-all-1.8 
> [https://mvnrepository.com/artifact/org.fusesource.leveldbjni/leveldbjni-all/1.8],
>  we can see there is no arm64 supporting. So we build an arm64 supporting 
> release of leveldbjni see 
> [https://mvnrepository.com/artifact/org.openlabtesting.leveldbjni/leveldbjni-all/1.8],
>  but we can't modified the spark pom.xml directly with something like 
> 'property'/'profile' to choose correct jar package on arm or x86 platform, 
> because spark depends on some hadoop packages like hadoop-hdfs, the packages 
> depend on leveldbjni-all-1.8 too, unless hadoop release with new arm 
> supporting leveldbjni jar. Now we download the leveldbjni-al-1.8 of 
> openlabtesting and 'mvn install' to use it when arm testing for spark.
> PS: The issues found and fixed:
>  SPARK-28770
>  [https://github.com/apache/spark/pull/25673]
>   
>  SPARK-28519
>  [https://github.com/apache/spark/pull/25279]
>   
>  SPARK-28433
>  [https://github.com/apache/spark/pull/25186]
>  
> SPARK-28467
> [https://github.com/apache/spark/pull/25864]
>  
> SPARK-29286
> [https://github.com/apache/spark/pull/26021]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29503) MapObjects doesn't copy Unsafe data when nested under Safe data

2019-10-18 Thread L. C. Hsieh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh updated SPARK-29503:

Description: 
In order for MapObjects to operate safely, it checks to see if the result of 
the mapping function is an Unsafe type (UnsafeRow, UnsafeArrayData, 
UnsafeMapData) and performs a copy before writing it into MapObjects' output 
array. This is to protect against expressions which re-use the same native 
memory buffer to represent its result across evaluations; if the copy wasn't 
here, all results would be pointing to the same native buffer and would 
represent the last result written to the buffer. However, MapObjects misses 
this needed copy if the Unsafe data is nested below some safe structure, for 
instance a GenericArrrayData whose elements are all UnsafeRows. In this 
scenario, all elements of the GenericArrayData will be pointing to the same 
native UnsafeRow buffer which will hold the last value written to it.

 

Right now, this bug seems to only occur when a `ProjectExec` goes down the 
`execute` path, as opposed to WholeStageCodegen's `produce` and `consume` path.

 

Example Reproduction Code:
{code:scala}
import org.apache.spark.sql.catalyst.expressions.objects.MapObjects
import org.apache.spark.sql.catalyst.expressions.CreateArray
import org.apache.spark.sql.catalyst.expressions.Expression
import org.apache.spark.sql.functions.{array, struct}
import org.apache.spark.sql.Column
import org.apache.spark.sql.types.ArrayType

// For the purpose of demonstration, we need to disable WholeStage codegen
spark.conf.set("spark.sql.codegen.wholeStage", "false")

val exampleDS = spark.sparkContext.parallelize(Seq(Seq(1, 2, 3))).toDF("items")

// Trivial example: Nest unsafe struct inside safe array
// items: Seq[Int] => items.map{item => Seq(Struct(item))}
val result = exampleDS.select(
new Column(MapObjects(
{item: Expression => array(struct(new Column(item))).expr},
$"items".expr,
exampleDS.schema("items").dataType.asInstanceOf[ArrayType].elementType
)) as "items"
)

result.show(10, false)
{code}
 

Actual Output:
{code:java}
+-+
|items|
+-+
|[WrappedArray([3]), WrappedArray([3]), WrappedArray([3])]|
+-+
{code}
 

Expected Output:
{code:java}
+-+
|items|
+-+
|[WrappedArray([1]), WrappedArray([2]), WrappedArray([3])]|
+-+
{code}
 

We've confirmed that the bug exists on version 2.1.1 as well as on master 
(which I assume corresponds to version 3.0.0?)

 

  was:
*strong text*In order for MapObjects to operate safely, it checks to see if the 
result of the mapping function is an Unsafe type (UnsafeRow, UnsafeArrayData, 
UnsafeMapData) and performs a copy before writing it into MapObjects' output 
array. This is to protect against expressions which re-use the same native 
memory buffer to represent its result across evaluations; if the copy wasn't 
here, all results would be pointing to the same native buffer and would 
represent the last result written to the buffer. However, MapObjects misses 
this needed copy if the Unsafe data is nested below some safe structure, for 
instance a GenericArrrayData whose elements are all UnsafeRows. In this 
scenario, all elements of the GenericArrayData will be pointing to the same 
native UnsafeRow buffer which will hold the last value written to it.

 

Right now, this bug seems to only occur when a `ProjectExec` goes down the 
`execute` path, as opposed to WholeStageCodegen's `produce` and `consume` path.

 

Example Reproduction Code:
{code:scala}
import org.apache.spark.sql.catalyst.expressions.objects.MapObjects
import org.apache.spark.sql.catalyst.expressions.CreateArray
import org.apache.spark.sql.catalyst.expressions.Expression
import org.apache.spark.sql.functions.{array, struct}
import org.apache.spark.sql.Column
import org.apache.spark.sql.types.ArrayType

// For the purpose of demonstration, we need to disable WholeStage codegen
spark.conf.set("spark.sql.codegen.wholeStage", "false")

val exampleDS = spark.sparkContext.parallelize(Seq(Seq(1, 2, 3))).toDF("items")

// Trivial example: Nest unsafe struct inside safe array
// items: Seq[Int] => items.map{item => Seq(Struct(item))}
val result = exampleDS.select(
new Column(MapObjects(
{item: Expression => array(struct(new Column(item))).expr},
$"items".expr,
exampleDS.schema("items").dataType.asInstanceOf[ArrayType].elementType
)) as "items"
)

result.show(10, false)
{code}
 

Actual Output:
{code:java

[jira] [Updated] (SPARK-29503) MapObjects doesn't copy Unsafe data when nested under Safe data

2019-10-18 Thread L. C. Hsieh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh updated SPARK-29503:

Description: 
*strong text*In order for MapObjects to operate safely, it checks to see if the 
result of the mapping function is an Unsafe type (UnsafeRow, UnsafeArrayData, 
UnsafeMapData) and performs a copy before writing it into MapObjects' output 
array. This is to protect against expressions which re-use the same native 
memory buffer to represent its result across evaluations; if the copy wasn't 
here, all results would be pointing to the same native buffer and would 
represent the last result written to the buffer. However, MapObjects misses 
this needed copy if the Unsafe data is nested below some safe structure, for 
instance a GenericArrrayData whose elements are all UnsafeRows. In this 
scenario, all elements of the GenericArrayData will be pointing to the same 
native UnsafeRow buffer which will hold the last value written to it.

 

Right now, this bug seems to only occur when a `ProjectExec` goes down the 
`execute` path, as opposed to WholeStageCodegen's `produce` and `consume` path.

 

Example Reproduction Code:
{code:scala}
import org.apache.spark.sql.catalyst.expressions.objects.MapObjects
import org.apache.spark.sql.catalyst.expressions.CreateArray
import org.apache.spark.sql.catalyst.expressions.Expression
import org.apache.spark.sql.functions.{array, struct}
import org.apache.spark.sql.Column
import org.apache.spark.sql.types.ArrayType

// For the purpose of demonstration, we need to disable WholeStage codegen
spark.conf.set("spark.sql.codegen.wholeStage", "false")

val exampleDS = spark.sparkContext.parallelize(Seq(Seq(1, 2, 3))).toDF("items")

// Trivial example: Nest unsafe struct inside safe array
// items: Seq[Int] => items.map{item => Seq(Struct(item))}
val result = exampleDS.select(
new Column(MapObjects(
{item: Expression => array(struct(new Column(item))).expr},
$"items".expr,
exampleDS.schema("items").dataType.asInstanceOf[ArrayType].elementType
)) as "items"
)

result.show(10, false)
{code}
 

Actual Output:
{code:java}
+-+
|items|
+-+
|[WrappedArray([3]), WrappedArray([3]), WrappedArray([3])]|
+-+
{code}
 

Expected Output:
{code:java}
+-+
|items|
+-+
|[WrappedArray([1]), WrappedArray([2]), WrappedArray([3])]|
+-+
{code}
 

We've confirmed that the bug exists on version 2.1.1 as well as on master 
(which I assume corresponds to version 3.0.0?)

 

  was:
In order for MapObjects to operate safely, it checks to see if the result of 
the mapping function is an Unsafe type (UnsafeRow, UnsafeArrayData, 
UnsafeMapData) and performs a copy before writing it into MapObjects' output 
array. This is to protect against expressions which re-use the same native 
memory buffer to represent its result across evaluations; if the copy wasn't 
here, all results would be pointing to the same native buffer and would 
represent the last result written to the buffer. However, MapObjects misses 
this needed copy if the Unsafe data is nested below some safe structure, for 
instance a GenericArrrayData whose elements are all UnsafeRows. In this 
scenario, all elements of the GenericArrayData will be pointing to the same 
native UnsafeRow buffer which will hold the last value written to it.

 

Right now, this bug seems to only occur when a `ProjectExec` goes down the 
`execute` path, as opposed to WholeStageCodegen's `produce` and `consume` path.

 

Example Reproduction Code:
{code:scala}
import org.apache.spark.sql.catalyst.expressions.objects.MapObjects
import org.apache.spark.sql.catalyst.expressions.CreateArray
import org.apache.spark.sql.catalyst.expressions.Expression
import org.apache.spark.sql.functions.{array, struct}
import org.apache.spark.sql.Column
import org.apache.spark.sql.types.ArrayType

// For the purpose of demonstration, we need to disable WholeStage codegen
spark.conf.set("spark.sql.codegen.wholeStage", "false")

val exampleDS = spark.sparkContext.parallelize(Seq(Seq(1, 2, 3))).toDF("items")

// Trivial example: Nest unsafe struct inside safe array
// items: Seq[Int] => items.map{item => Seq(Struct(item))}
val result = exampleDS.select(
new Column(MapObjects(
{item: Expression => array(struct(new Column(item))).expr},
$"items".expr,
exampleDS.schema("items").dataType.asInstanceOf[ArrayType].elementType
)) as "items"
)

result.show(10, false)
{code}
 

Actual Output:
{code:java

[jira] [Commented] (SPARK-29106) Add jenkins arm test for spark

2019-10-18 Thread zhao bo (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954879#comment-16954879
 ] 

zhao bo commented on SPARK-29106:
-

And if possible and you are free, could you please help us to make the test 
script better and make it more like a good test process? Such as, you mentioned 
installing test debs before test some modules..  Thanks very much, @shane

> Add jenkins arm test for spark
> --
>
> Key: SPARK-29106
> URL: https://issues.apache.org/jira/browse/SPARK-29106
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: huangtianhua
>Priority: Minor
>
> Add arm test jobs to amplab jenkins for spark.
> Till now we made two arm test periodic jobs for spark in OpenLab, one is 
> based on master with hadoop 2.7(similar with QA test of amplab jenkins), 
> other one is based on a new branch which we made on date 09-09, see  
> [http://status.openlabtesting.org/builds/job/spark-master-unit-test-hadoop-2.7-arm64]
>   and 
> [http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64.|http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64]
>  We only have to care about the first one when integrate arm test with amplab 
> jenkins.
> About the k8s test on arm, we have took test it, see 
> [https://github.com/theopenlab/spark/pull/17], maybe we can integrate it 
> later. 
> And we plan test on other stable branches too, and we can integrate them to 
> amplab when they are ready.
> We have offered an arm instance and sent the infos to shane knapp, thanks 
> shane to add the first arm job to amplab jenkins :) 
> The other important thing is about the leveldbjni 
> [https://github.com/fusesource/leveldbjni,|https://github.com/fusesource/leveldbjni/issues/80]
>  spark depends on leveldbjni-all-1.8 
> [https://mvnrepository.com/artifact/org.fusesource.leveldbjni/leveldbjni-all/1.8],
>  we can see there is no arm64 supporting. So we build an arm64 supporting 
> release of leveldbjni see 
> [https://mvnrepository.com/artifact/org.openlabtesting.leveldbjni/leveldbjni-all/1.8],
>  but we can't modified the spark pom.xml directly with something like 
> 'property'/'profile' to choose correct jar package on arm or x86 platform, 
> because spark depends on some hadoop packages like hadoop-hdfs, the packages 
> depend on leveldbjni-all-1.8 too, unless hadoop release with new arm 
> supporting leveldbjni jar. Now we download the leveldbjni-al-1.8 of 
> openlabtesting and 'mvn install' to use it when arm testing for spark.
> PS: The issues found and fixed:
>  SPARK-28770
>  [https://github.com/apache/spark/pull/25673]
>   
>  SPARK-28519
>  [https://github.com/apache/spark/pull/25279]
>   
>  SPARK-28433
>  [https://github.com/apache/spark/pull/25186]
>  
> SPARK-28467
> [https://github.com/apache/spark/pull/25864]
>  
> SPARK-29286
> [https://github.com/apache/spark/pull/26021]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29076) Generalize the PVTestSuite to no longer need the minikube tag

2019-10-18 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-29076:
--
Priority: Major  (was: Trivial)

> Generalize the PVTestSuite to no longer need the minikube tag
> -
>
> Key: SPARK-29076
> URL: https://issues.apache.org/jira/browse/SPARK-29076
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Tests
>Affects Versions: 3.0.0
>Reporter: Holden Karau
>Priority: Major
>
> Currently the PVTestSuite has the MiniKube test tag applied so it can be 
> skipped for non-minikube tests. It should be somewhat easily generalizable to 
> at least other local k8s test envs, however as written it depends on being 
> able to mount a local folder as a PV so may take more work to generalize to 
> arbitrary k8s.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29106) Add jenkins arm test for spark

2019-10-18 Thread zhao bo (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954842#comment-16954842
 ] 

zhao bo commented on SPARK-29106:
-

Thanks @shane. Correct, the test dependency of pyspark and spark R are 
installed when we test the demo on the VM after your email.  For now, we can 
focus on the maven test, the pyspark and spark R test are just show both of 
them can success on VM. If we find there would be some improvements about all 
thing, just feel free to point out and we try the best to do that. And that 
would be great if the first periodic job join in our jenkins env recently. 

> Add jenkins arm test for spark
> --
>
> Key: SPARK-29106
> URL: https://issues.apache.org/jira/browse/SPARK-29106
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: huangtianhua
>Priority: Minor
>
> Add arm test jobs to amplab jenkins for spark.
> Till now we made two arm test periodic jobs for spark in OpenLab, one is 
> based on master with hadoop 2.7(similar with QA test of amplab jenkins), 
> other one is based on a new branch which we made on date 09-09, see  
> [http://status.openlabtesting.org/builds/job/spark-master-unit-test-hadoop-2.7-arm64]
>   and 
> [http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64.|http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64]
>  We only have to care about the first one when integrate arm test with amplab 
> jenkins.
> About the k8s test on arm, we have took test it, see 
> [https://github.com/theopenlab/spark/pull/17], maybe we can integrate it 
> later. 
> And we plan test on other stable branches too, and we can integrate them to 
> amplab when they are ready.
> We have offered an arm instance and sent the infos to shane knapp, thanks 
> shane to add the first arm job to amplab jenkins :) 
> The other important thing is about the leveldbjni 
> [https://github.com/fusesource/leveldbjni,|https://github.com/fusesource/leveldbjni/issues/80]
>  spark depends on leveldbjni-all-1.8 
> [https://mvnrepository.com/artifact/org.fusesource.leveldbjni/leveldbjni-all/1.8],
>  we can see there is no arm64 supporting. So we build an arm64 supporting 
> release of leveldbjni see 
> [https://mvnrepository.com/artifact/org.openlabtesting.leveldbjni/leveldbjni-all/1.8],
>  but we can't modified the spark pom.xml directly with something like 
> 'property'/'profile' to choose correct jar package on arm or x86 platform, 
> because spark depends on some hadoop packages like hadoop-hdfs, the packages 
> depend on leveldbjni-all-1.8 too, unless hadoop release with new arm 
> supporting leveldbjni jar. Now we download the leveldbjni-al-1.8 of 
> openlabtesting and 'mvn install' to use it when arm testing for spark.
> PS: The issues found and fixed:
>  SPARK-28770
>  [https://github.com/apache/spark/pull/25673]
>   
>  SPARK-28519
>  [https://github.com/apache/spark/pull/25279]
>   
>  SPARK-28433
>  [https://github.com/apache/spark/pull/25186]
>  
> SPARK-28467
> [https://github.com/apache/spark/pull/25864]
>  
> SPARK-29286
> [https://github.com/apache/spark/pull/26021]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29514) String function: string_to_array

2019-10-18 Thread Kent Yao (Jira)

Kent Yao created SPARK-29514:


 Summary: String function: string_to_array
 Key: SPARK-29514
 URL: https://issues.apache.org/jira/browse/SPARK-29514
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Kent Yao


|string_to_array}}(}}{{text}}{{, }}{{text}}{{ [, 
{{text}}])}}|{{text[]}}|splits string into array elements using supplied 
delimiter and optional null string|{{string_to_array('xx~^~yy~^~zz', '~^~', 
'yy')}}|{{{xx,NULL,zz}}}|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29106) Add jenkins arm test for spark

2019-10-18 Thread Shane Knapp (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954809#comment-16954809
 ] 

Shane Knapp commented on SPARK-29106:
-

we're definitely going to have an issue w/the both R and python tests as it 
looks like none of the testing deps have been installed.

we use anaconda python to manage our bare metal, so i'll have to see if i can 
make things work w/virtualenv.

R, well, that's always a can of worms best left untouched.

> Add jenkins arm test for spark
> --
>
> Key: SPARK-29106
> URL: https://issues.apache.org/jira/browse/SPARK-29106
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: huangtianhua
>Priority: Minor
>
> Add arm test jobs to amplab jenkins for spark.
> Till now we made two arm test periodic jobs for spark in OpenLab, one is 
> based on master with hadoop 2.7(similar with QA test of amplab jenkins), 
> other one is based on a new branch which we made on date 09-09, see  
> [http://status.openlabtesting.org/builds/job/spark-master-unit-test-hadoop-2.7-arm64]
>   and 
> [http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64.|http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64]
>  We only have to care about the first one when integrate arm test with amplab 
> jenkins.
> About the k8s test on arm, we have took test it, see 
> [https://github.com/theopenlab/spark/pull/17], maybe we can integrate it 
> later. 
> And we plan test on other stable branches too, and we can integrate them to 
> amplab when they are ready.
> We have offered an arm instance and sent the infos to shane knapp, thanks 
> shane to add the first arm job to amplab jenkins :) 
> The other important thing is about the leveldbjni 
> [https://github.com/fusesource/leveldbjni,|https://github.com/fusesource/leveldbjni/issues/80]
>  spark depends on leveldbjni-all-1.8 
> [https://mvnrepository.com/artifact/org.fusesource.leveldbjni/leveldbjni-all/1.8],
>  we can see there is no arm64 supporting. So we build an arm64 supporting 
> release of leveldbjni see 
> [https://mvnrepository.com/artifact/org.openlabtesting.leveldbjni/leveldbjni-all/1.8],
>  but we can't modified the spark pom.xml directly with something like 
> 'property'/'profile' to choose correct jar package on arm or x86 platform, 
> because spark depends on some hadoop packages like hadoop-hdfs, the packages 
> depend on leveldbjni-all-1.8 too, unless hadoop release with new arm 
> supporting leveldbjni jar. Now we download the leveldbjni-al-1.8 of 
> openlabtesting and 'mvn install' to use it when arm testing for spark.
> PS: The issues found and fixed:
>  SPARK-28770
>  [https://github.com/apache/spark/pull/25673]
>   
>  SPARK-28519
>  [https://github.com/apache/spark/pull/25279]
>   
>  SPARK-28433
>  [https://github.com/apache/spark/pull/25186]
>  
> SPARK-28467
> [https://github.com/apache/spark/pull/25864]
>  
> SPARK-29286
> [https://github.com/apache/spark/pull/26021]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29507) Support ALTER TABLE SET OWNER command

2019-10-18 Thread Kent Yao (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954805#comment-16954805
 ] 

Kent Yao commented on SPARK-29507:
--

I am working on it

> Support ALTER TABLE SET OWNER command
> -
>
> Key: SPARK-29507
> URL: https://issues.apache.org/jira/browse/SPARK-29507
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Kent Yao
>Priority: Major
>
> see https://jira.apache.org/jira/browse/HIVE-18762



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27812) kubernetes client import non-daemon thread which block jvm exit.

2019-10-18 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-27812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-27812:
--
Fix Version/s: 2.4.5

> kubernetes client import non-daemon thread which block jvm exit.
> 
>
> Key: SPARK-27812
> URL: https://issues.apache.org/jira/browse/SPARK-27812
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.3, 2.4.4
>Reporter: Henry Yu
>Assignee: Igor Calabria
>Priority: Major
> Fix For: 2.4.5, 3.0.0
>
>
> I try spark-submit to k8s with cluster mode. Driver pod failed to exit with 
> An Okhttp Websocket Non-Daemon Thread.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28420) Date/Time Functions: date_part for intervals

2019-10-18 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-28420.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 25981
[https://github.com/apache/spark/pull/25981]

> Date/Time Functions: date_part for intervals
> 
>
> Key: SPARK-28420
> URL: https://issues.apache.org/jira/browse/SPARK-28420
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.0.0
>
>
> ||Function||Return Type||Description||Example||Result||
> |{{date_part(}}{{text}}{{, }}{{interval}}{{)}}|{{double precision}}|Get 
> subfield (equivalent to {{extract}}); see [Section 
> 9.9.1|https://www.postgresql.org/docs/11/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT]|{{date_part('month',
>  interval '2 years 3 months')}}|{{3}}|
> We can replace it with {{extract(field from timestamp)}}.
> https://www.postgresql.org/docs/11/functions-datetime.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-28420) Date/Time Functions: date_part for intervals

2019-10-18 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-28420:
---

Assignee: Maxim Gekk  (was: Yuming Wang)

> Date/Time Functions: date_part for intervals
> 
>
> Key: SPARK-28420
> URL: https://issues.apache.org/jira/browse/SPARK-28420
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.0.0
>
>
> ||Function||Return Type||Description||Example||Result||
> |{{date_part(}}{{text}}{{, }}{{interval}}{{)}}|{{double precision}}|Get 
> subfield (equivalent to {{extract}}); see [Section 
> 9.9.1|https://www.postgresql.org/docs/11/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT]|{{date_part('month',
>  interval '2 years 3 months')}}|{{3}}|
> We can replace it with {{extract(field from timestamp)}}.
> https://www.postgresql.org/docs/11/functions-datetime.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-28420) Date/Time Functions: date_part for intervals

2019-10-18 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-28420:
---

Assignee: Yuming Wang

> Date/Time Functions: date_part for intervals
> 
>
> Key: SPARK-28420
> URL: https://issues.apache.org/jira/browse/SPARK-28420
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>
> ||Function||Return Type||Description||Example||Result||
> |{{date_part(}}{{text}}{{, }}{{interval}}{{)}}|{{double precision}}|Get 
> subfield (equivalent to {{extract}}); see [Section 
> 9.9.1|https://www.postgresql.org/docs/11/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT]|{{date_part('month',
>  interval '2 years 3 months')}}|{{3}}|
> We can replace it with {{extract(field from timestamp)}}.
> https://www.postgresql.org/docs/11/functions-datetime.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29106) Add jenkins arm test for spark

2019-10-18 Thread Shane Knapp (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954705#comment-16954705
 ] 

Shane Knapp commented on SPARK-29106:
-

re: real time logging -- yeah i noticed that.  :)

i'll look at that script and play around w/it today.

> Add jenkins arm test for spark
> --
>
> Key: SPARK-29106
> URL: https://issues.apache.org/jira/browse/SPARK-29106
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: huangtianhua
>Priority: Minor
>
> Add arm test jobs to amplab jenkins for spark.
> Till now we made two arm test periodic jobs for spark in OpenLab, one is 
> based on master with hadoop 2.7(similar with QA test of amplab jenkins), 
> other one is based on a new branch which we made on date 09-09, see  
> [http://status.openlabtesting.org/builds/job/spark-master-unit-test-hadoop-2.7-arm64]
>   and 
> [http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64.|http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64]
>  We only have to care about the first one when integrate arm test with amplab 
> jenkins.
> About the k8s test on arm, we have took test it, see 
> [https://github.com/theopenlab/spark/pull/17], maybe we can integrate it 
> later. 
> And we plan test on other stable branches too, and we can integrate them to 
> amplab when they are ready.
> We have offered an arm instance and sent the infos to shane knapp, thanks 
> shane to add the first arm job to amplab jenkins :) 
> The other important thing is about the leveldbjni 
> [https://github.com/fusesource/leveldbjni,|https://github.com/fusesource/leveldbjni/issues/80]
>  spark depends on leveldbjni-all-1.8 
> [https://mvnrepository.com/artifact/org.fusesource.leveldbjni/leveldbjni-all/1.8],
>  we can see there is no arm64 supporting. So we build an arm64 supporting 
> release of leveldbjni see 
> [https://mvnrepository.com/artifact/org.openlabtesting.leveldbjni/leveldbjni-all/1.8],
>  but we can't modified the spark pom.xml directly with something like 
> 'property'/'profile' to choose correct jar package on arm or x86 platform, 
> because spark depends on some hadoop packages like hadoop-hdfs, the packages 
> depend on leveldbjni-all-1.8 too, unless hadoop release with new arm 
> supporting leveldbjni jar. Now we download the leveldbjni-al-1.8 of 
> openlabtesting and 'mvn install' to use it when arm testing for spark.
> PS: The issues found and fixed:
>  SPARK-28770
>  [https://github.com/apache/spark/pull/25673]
>   
>  SPARK-28519
>  [https://github.com/apache/spark/pull/25279]
>   
>  SPARK-28433
>  [https://github.com/apache/spark/pull/25186]
>  
> SPARK-28467
> [https://github.com/apache/spark/pull/25864]
>  
> SPARK-29286
> [https://github.com/apache/spark/pull/26021]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29481) all the commands should look up catalog/table like v2 commands

2019-10-18 Thread L. C. Hsieh (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954688#comment-16954688
 ] 

L. C. Hsieh commented on SPARK-29481:
-

Thanks for pinging me. Will take some in this weekend.

> all the commands should look up catalog/table like v2 commands
> --
>
> Key: SPARK-29481
> URL: https://issues.apache.org/jira/browse/SPARK-29481
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Priority: Major
>
> The newly added v2 commands support multi-catalog and respect the current 
> catalog/namespace. However, it's not true for old v1 commands.
> This leads to very confusing behaviors, for example
> {code}
> USE my_catalog
> DESC t // success and describe the table t from my_catalog
> ANALYZE TABLE t // report table not found as there is no table t in the 
> session catalog
> {code}
> We should make sure all the commands have the same behavior regarding table 
> resolution



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29513) REFRESH TABLE should look up catalog/table like v2 commands

2019-10-18 Thread Terry Kim (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954675#comment-16954675
 ] 

Terry Kim commented on SPARK-29513:
---

I am working on this.

> REFRESH TABLE should look up catalog/table like v2 commands
> ---
>
> Key: SPARK-29513
> URL: https://issues.apache.org/jira/browse/SPARK-29513
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Terry Kim
>Priority: Major
>
> REFRESH TABLE should look up catalog/table like v2 commands



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29513) REFRESH TABLE should look up catalog/table like v2 commands

2019-10-18 Thread Terry Kim (Jira)

Terry Kim created SPARK-29513:
-

 Summary: REFRESH TABLE should look up catalog/table like v2 
commands
 Key: SPARK-29513
 URL: https://issues.apache.org/jira/browse/SPARK-29513
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Terry Kim


REFRESH TABLE should look up catalog/table like v2 commands



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29512) REPAIR TABLE should look up catalog/table like v2 commands

2019-10-18 Thread Terry Kim (Jira)

Terry Kim created SPARK-29512:
-

 Summary: REPAIR TABLE should look up catalog/table like v2 commands
 Key: SPARK-29512
 URL: https://issues.apache.org/jira/browse/SPARK-29512
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Terry Kim


REPAIR TABLE should look up catalog/table like v2 commands



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29512) REPAIR TABLE should look up catalog/table like v2 commands

2019-10-18 Thread Terry Kim (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954672#comment-16954672
 ] 

Terry Kim commented on SPARK-29512:
---

I am working on this.

> REPAIR TABLE should look up catalog/table like v2 commands
> --
>
> Key: SPARK-29512
> URL: https://issues.apache.org/jira/browse/SPARK-29512
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Terry Kim
>Priority: Major
>
> REPAIR TABLE should look up catalog/table like v2 commands



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-29014) DataSourceV2: Clean up current, default, and session catalog uses

2019-10-18 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-29014.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26120
[https://github.com/apache/spark/pull/26120]

> DataSourceV2: Clean up current, default, and session catalog uses
> -
>
> Key: SPARK-29014
> URL: https://issues.apache.org/jira/browse/SPARK-29014
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Ryan Blue
>Assignee: Terry Kim
>Priority: Blocker
> Fix For: 3.0.0
>
>
> Catalog tracking in DSv2 has evolved since the initial changes went in. We 
> need to make sure that handling is consistent across plans using the latest 
> rules:
>  * The _current_ catalog should be used when no catalog is specified
>  * The _default_ catalog is the catalog _current_ is initialized to
>  * If the _default_ catalog is not set, then it is the built-in Spark session 
> catalog, which will be called `spark_catalog` (This is the v2 session catalog)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-29014) DataSourceV2: Clean up current, default, and session catalog uses

2019-10-18 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-29014:
---

Assignee: Terry Kim

> DataSourceV2: Clean up current, default, and session catalog uses
> -
>
> Key: SPARK-29014
> URL: https://issues.apache.org/jira/browse/SPARK-29014
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Ryan Blue
>Assignee: Terry Kim
>Priority: Blocker
>
> Catalog tracking in DSv2 has evolved since the initial changes went in. We 
> need to make sure that handling is consistent across plans using the latest 
> rules:
>  * The _current_ catalog should be used when no catalog is specified
>  * The _default_ catalog is the catalog _current_ is initialized to
>  * If the _default_ catalog is not set, then it is the built-in Spark session 
> catalog, which will be called `spark_catalog` (This is the v2 session catalog)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29511) DataSourceV2: Support CREATE NAMESPACE

2019-10-18 Thread Terry Kim (Jira)

Terry Kim created SPARK-29511:
-

 Summary: DataSourceV2: Support CREATE NAMESPACE
 Key: SPARK-29511
 URL: https://issues.apache.org/jira/browse/SPARK-29511
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Terry Kim


CREATE NAMESPACE needs to support v2 catalogs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29481) all the commands should look up catalog/table like v2 commands

2019-10-18 Thread Wenchen Fan (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954661#comment-16954661
 ] 

Wenchen Fan commented on SPARK-29481:
-

I really appreciate it if more people can help on this ticket. There are still 
many commands need to be handled, e.g. REPAIR TABLE, REFRESH TABLE, ADD 
PARTITION, etc. We can take a look at `SparkSqlAstBuilder` and pick the 
commands that need to resolve a table.

https://github.com/apache/spark/pull/26129 is a good example about how to add a 
v2 command:
1. create a statement plan for the command.
2. update `SqlBase.g4` and `AstBuilder`, use multi-part table name for the 
command and create statement plan after parsing.
3. create a logical plan and physical plan for the command, if this command can 
be implemented via v2 APIs (e.g. REFRESH TABLE).
4. update `ResolveCatalogs`, convert the statement plan to the logical plan, if 
we create such a logical plan in step 3.
5. update `ResolveSessionCatalog`, convert the statement plan to the old v1 
command plan.
6. add tests in `DDLSuite` and `DataSourceV2SQLSuite`

Please create a sub-task for this ticket if you want to work on one command.

cc [~imback82] [~rdblue] [~dongjoon] [~viirya] 



> all the commands should look up catalog/table like v2 commands
> --
>
> Key: SPARK-29481
> URL: https://issues.apache.org/jira/browse/SPARK-29481
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Priority: Major
>
> The newly added v2 commands support multi-catalog and respect the current 
> catalog/namespace. However, it's not true for old v1 commands.
> This leads to very confusing behaviors, for example
> {code}
> USE my_catalog
> DESC t // success and describe the table t from my_catalog
> ANALYZE TABLE t // report table not found as there is no table t in the 
> session catalog
> {code}
> We should make sure all the commands have the same behavior regarding table 
> resolution



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29510) JobGroup ID is not set for the job submitted from Spark-SQL and Spark -Shell

2019-10-18 Thread ABHISHEK KUMAR GUPTA (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ABHISHEK KUMAR GUPTA updated SPARK-29510:
-
Attachment: JobGroup2.png
JobGroup3.png
JobGroup1.png

> JobGroup ID is not set for the job submitted from Spark-SQL and Spark -Shell
> 
>
> Key: SPARK-29510
> URL: https://issues.apache.org/jira/browse/SPARK-29510
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell, SQL
>Affects Versions: 3.0.0
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Major
> Attachments: JobGroup1.png, JobGroup2.png, JobGroup3.png
>
>
> When user submit jobs from Spark-shell or SQL Job group id is not set. UI 
> Screen shot attached.
> But from beeline job Group ID is set.
> Steps:
> create table customer(id int, name String, CName String, address String, city 
> String, pin int, country String);
> insert into customer values(1,'Alfred','Maria','Obere Str 
> 57','Berlin',12209,'Germany');
> insert into customer values(2,'Ana','trujilo','Adva de la','Maxico 
> D.F.',05021,'Maxico');
> insert into customer values(3,'Antonio','Antonio Moreno','Mataderos 
> 2312','Maxico D.F.',05023,'Maxico');



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29510) JobGroup ID is not set for the job submitted from Spark-SQL and Spark -Shell

2019-10-18 Thread Ankit Raj Boudh (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954645#comment-16954645
 ] 

Ankit Raj Boudh commented on SPARK-29510:
-

I will start working in this issue

> JobGroup ID is not set for the job submitted from Spark-SQL and Spark -Shell
> 
>
> Key: SPARK-29510
> URL: https://issues.apache.org/jira/browse/SPARK-29510
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell, SQL
>Affects Versions: 3.0.0
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Major
>
> When user submit jobs from Spark-shell or SQL Job group id is not set. UI 
> Screen shot attached.
> But from beeline job Group ID is set.
> Steps:
> create table customer(id int, name String, CName String, address String, city 
> String, pin int, country String);
> insert into customer values(1,'Alfred','Maria','Obere Str 
> 57','Berlin',12209,'Germany');
> insert into customer values(2,'Ana','trujilo','Adva de la','Maxico 
> D.F.',05021,'Maxico');
> insert into customer values(3,'Antonio','Antonio Moreno','Mataderos 
> 2312','Maxico D.F.',05023,'Maxico');



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29510) JobGroup ID is not set for the job submitted from Spark-SQL and Spark -Shell

2019-10-18 Thread ABHISHEK KUMAR GUPTA (Jira)

ABHISHEK KUMAR GUPTA created SPARK-29510:


 Summary: JobGroup ID is not set for the job submitted from 
Spark-SQL and Spark -Shell
 Key: SPARK-29510
 URL: https://issues.apache.org/jira/browse/SPARK-29510
 Project: Spark
  Issue Type: Bug
  Components: Spark Shell, SQL
Affects Versions: 3.0.0
Reporter: ABHISHEK KUMAR GUPTA


When user submit jobs from Spark-shell or SQL Job group id is not set. UI 
Screen shot attached.

But from beeline job Group ID is set.

Steps:

create table customer(id int, name String, CName String, address String, city 
String, pin int, country String);
insert into customer values(1,'Alfred','Maria','Obere Str 
57','Berlin',12209,'Germany');
insert into customer values(2,'Ana','trujilo','Adva de la','Maxico 
D.F.',05021,'Maxico');
insert into customer values(3,'Antonio','Antonio Moreno','Mataderos 
2312','Maxico D.F.',05023,'Maxico');



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29509) Deduplicate code blocks in Kafka data source

2019-10-18 Thread Jungtaek Lim (Jira)

Jungtaek Lim created SPARK-29509:


 Summary: Deduplicate code blocks in Kafka data source
 Key: SPARK-29509
 URL: https://issues.apache.org/jira/browse/SPARK-29509
 Project: Spark
  Issue Type: Task
  Components: SQL, Structured Streaming
Affects Versions: 3.0.0
Reporter: Jungtaek Lim


There're bunch of methods in Kafka data source which have repeated lines in a 
method - especially they're tied to the number of fields in writer schema, so 
once we add a new field redundant code lines will be increased. This issue 
tracks the efforts to deduplicate them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29508) Implicitly cast strings in datetime arithmetic operations

2019-10-18 Thread Maxim Gekk (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954579#comment-16954579
 ] 

Maxim Gekk commented on SPARK-29508:


I am working on it

> Implicitly cast strings in datetime arithmetic operations
> -
>
> Key: SPARK-29508
> URL: https://issues.apache.org/jira/browse/SPARK-29508
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Maxim Gekk
>Priority: Minor
>
> To improve Spark SQL UX, strings can be cast to the `INTERVAL` or `TIMESTAMP` 
> types in the cases:
>  # Cast string to interval in interval - string
>  # Cast string to interval in datetime + string or string + datetime
>  # Cast string to timestamp in datetime - string or string - datetime



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-29482) ANALYZE TABLE should look up catalog/table like v2 commands

2019-10-18 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-29482.

Resolution: Fixed

The issue is resolved in https://github.com/apache/spark/pull/26129

>  ANALYZE TABLE should look up catalog/table like v2 commands
> 
>
> Key: SPARK-29482
> URL: https://issues.apache.org/jira/browse/SPARK-29482
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29508) Implicitly cast strings in datetime arithmetic operations

2019-10-18 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-29508:
--

 Summary: Implicitly cast strings in datetime arithmetic operations
 Key: SPARK-29508
 URL: https://issues.apache.org/jira/browse/SPARK-29508
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.4
Reporter: Maxim Gekk


To improve Spark SQL UX, strings can be cast to the `INTERVAL` or `TIMESTAMP` 
types in the cases:
 # Cast string to interval in interval - string
 # Cast string to interval in datetime + string or string + datetime
 # Cast string to timestamp in datetime - string or string - datetime



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-29478) Improve tooltip information for AggregatedLogs Tab

2019-10-18 Thread ABHISHEK KUMAR GUPTA (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ABHISHEK KUMAR GUPTA resolved SPARK-29478.
--
Resolution: Invalid

It is not supported by Open Source community

> Improve tooltip information for AggregatedLogs Tab
> --
>
> Key: SPARK-29478
> URL: https://issues.apache.org/jira/browse/SPARK-29478
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-29489) ml.evaluation support log-loss

2019-10-18 Thread zhengruifeng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng reassigned SPARK-29489:


Assignee: zhengruifeng

> ml.evaluation support log-loss
> --
>
> Key: SPARK-29489
> URL: https://issues.apache.org/jira/browse/SPARK-29489
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, PySpark
>Affects Versions: 3.0.0
>Reporter: zhengruifeng
>Assignee: zhengruifeng
>Priority: Major
>
> {color:#5a6e5a}log-loss (aka logistic loss or cross-entropy loss) is one of 
> the most widely used metrics in classification tasks. It is already impled in 
> famous libraries like sklearn.
> {color}
> {color:#5a6e5a}However, it is missing so far.
> {color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-29489) ml.evaluation support log-loss

2019-10-18 Thread zhengruifeng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng resolved SPARK-29489.
--
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26135
[https://github.com/apache/spark/pull/26135]

> ml.evaluation support log-loss
> --
>
> Key: SPARK-29489
> URL: https://issues.apache.org/jira/browse/SPARK-29489
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, PySpark
>Affects Versions: 3.0.0
>Reporter: zhengruifeng
>Assignee: zhengruifeng
>Priority: Major
> Fix For: 3.0.0
>
>
> {color:#5a6e5a}log-loss (aka logistic loss or cross-entropy loss) is one of 
> the most widely used metrics in classification tasks. It is already impled in 
> famous libraries like sklearn.
> {color}
> {color:#5a6e5a}However, it is missing so far.
> {color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29106) Add jenkins arm test for spark

2019-10-18 Thread zhao bo (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954447#comment-16954447
 ] 

zhao bo commented on SPARK-29106:
-

The reason to introducing the shell script is that I found if we call ansible 
script, it won't log the real-time log information.;)

> Add jenkins arm test for spark
> --
>
> Key: SPARK-29106
> URL: https://issues.apache.org/jira/browse/SPARK-29106
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: huangtianhua
>Priority: Minor
>
> Add arm test jobs to amplab jenkins for spark.
> Till now we made two arm test periodic jobs for spark in OpenLab, one is 
> based on master with hadoop 2.7(similar with QA test of amplab jenkins), 
> other one is based on a new branch which we made on date 09-09, see  
> [http://status.openlabtesting.org/builds/job/spark-master-unit-test-hadoop-2.7-arm64]
>   and 
> [http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64.|http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64]
>  We only have to care about the first one when integrate arm test with amplab 
> jenkins.
> About the k8s test on arm, we have took test it, see 
> [https://github.com/theopenlab/spark/pull/17], maybe we can integrate it 
> later. 
> And we plan test on other stable branches too, and we can integrate them to 
> amplab when they are ready.
> We have offered an arm instance and sent the infos to shane knapp, thanks 
> shane to add the first arm job to amplab jenkins :) 
> The other important thing is about the leveldbjni 
> [https://github.com/fusesource/leveldbjni,|https://github.com/fusesource/leveldbjni/issues/80]
>  spark depends on leveldbjni-all-1.8 
> [https://mvnrepository.com/artifact/org.fusesource.leveldbjni/leveldbjni-all/1.8],
>  we can see there is no arm64 supporting. So we build an arm64 supporting 
> release of leveldbjni see 
> [https://mvnrepository.com/artifact/org.openlabtesting.leveldbjni/leveldbjni-all/1.8],
>  but we can't modified the spark pom.xml directly with something like 
> 'property'/'profile' to choose correct jar package on arm or x86 platform, 
> because spark depends on some hadoop packages like hadoop-hdfs, the packages 
> depend on leveldbjni-all-1.8 too, unless hadoop release with new arm 
> supporting leveldbjni jar. Now we download the leveldbjni-al-1.8 of 
> openlabtesting and 'mvn install' to use it when arm testing for spark.
> PS: The issues found and fixed:
>  SPARK-28770
>  [https://github.com/apache/spark/pull/25673]
>   
>  SPARK-28519
>  [https://github.com/apache/spark/pull/25279]
>   
>  SPARK-28433
>  [https://github.com/apache/spark/pull/25186]
>  
> SPARK-28467
> [https://github.com/apache/spark/pull/25864]
>  
> SPARK-29286
> [https://github.com/apache/spark/pull/26021]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29106) Add jenkins arm test for spark

2019-10-18 Thread zhao bo (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954445#comment-16954445
 ] 

zhao bo commented on SPARK-29106:
-

Hi, 

I create a pretty simple shell script in /home/jenkins/ansible_test_scripts/, 
it's named "sample_shell_test.sh" .

You can run the script directly after setting a "SPARK_HOME" env. 
[~shaneknapp], how about we try this script with jenkins?

 

> Add jenkins arm test for spark
> --
>
> Key: SPARK-29106
> URL: https://issues.apache.org/jira/browse/SPARK-29106
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: huangtianhua
>Priority: Minor
>
> Add arm test jobs to amplab jenkins for spark.
> Till now we made two arm test periodic jobs for spark in OpenLab, one is 
> based on master with hadoop 2.7(similar with QA test of amplab jenkins), 
> other one is based on a new branch which we made on date 09-09, see  
> [http://status.openlabtesting.org/builds/job/spark-master-unit-test-hadoop-2.7-arm64]
>   and 
> [http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64.|http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64]
>  We only have to care about the first one when integrate arm test with amplab 
> jenkins.
> About the k8s test on arm, we have took test it, see 
> [https://github.com/theopenlab/spark/pull/17], maybe we can integrate it 
> later. 
> And we plan test on other stable branches too, and we can integrate them to 
> amplab when they are ready.
> We have offered an arm instance and sent the infos to shane knapp, thanks 
> shane to add the first arm job to amplab jenkins :) 
> The other important thing is about the leveldbjni 
> [https://github.com/fusesource/leveldbjni,|https://github.com/fusesource/leveldbjni/issues/80]
>  spark depends on leveldbjni-all-1.8 
> [https://mvnrepository.com/artifact/org.fusesource.leveldbjni/leveldbjni-all/1.8],
>  we can see there is no arm64 supporting. So we build an arm64 supporting 
> release of leveldbjni see 
> [https://mvnrepository.com/artifact/org.openlabtesting.leveldbjni/leveldbjni-all/1.8],
>  but we can't modified the spark pom.xml directly with something like 
> 'property'/'profile' to choose correct jar package on arm or x86 platform, 
> because spark depends on some hadoop packages like hadoop-hdfs, the packages 
> depend on leveldbjni-all-1.8 too, unless hadoop release with new arm 
> supporting leveldbjni jar. Now we download the leveldbjni-al-1.8 of 
> openlabtesting and 'mvn install' to use it when arm testing for spark.
> PS: The issues found and fixed:
>  SPARK-28770
>  [https://github.com/apache/spark/pull/25673]
>   
>  SPARK-28519
>  [https://github.com/apache/spark/pull/25279]
>   
>  SPARK-28433
>  [https://github.com/apache/spark/pull/25186]
>  
> SPARK-28467
> [https://github.com/apache/spark/pull/25864]
>  
> SPARK-29286
> [https://github.com/apache/spark/pull/26021]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29262) DataFrameWriter insertIntoPartition function

2019-10-18 Thread feiwang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954424#comment-16954424
 ] 

feiwang commented on SPARK-29262:
-

I'll try to implement it.

> DataFrameWriter insertIntoPartition function
> 
>
> Key: SPARK-29262
> URL: https://issues.apache.org/jira/browse/SPARK-29262
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: feiwang
>Priority: Minor
>
> InsertIntoPartition is a useful function.
> For SQL statement, relative syntax.
> {code:java}
> insert overwrite table tbl_a partition(p1=v1,p2=v2,...,pn=vn) select ...
> {code}
> In the example above, I specify all the partition key value, so it must be a 
> static partition overwrite, regardless whether enable dynamic partition 
> overwrite.
> If we enable dynamic partition overwrite. For the sql below, it will only 
> overwrite relative partition not whole table.
> If we disable dynamic partition overwrite, it will overwrite whole table.
> {code:java}
> insert overwrite table tbl_a partition(p1,p2,...,pn) select ...
> {code}
> As far as now, dataFrame does not support overwrite a specific partition.
> It means that, for a partitioned table, if we insert overwrite  by using 
> dataFrame with dynamic partition overwrite disabled,  it will always 
> overwrite whole table.
> So, we should support insertIntoPartition for dataFrameWriter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28120) RocksDB state storage

2019-10-18 Thread Vikram Agrawal (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Agrawal resolved SPARK-28120.

Resolution: Later

The implementation will be submitted to https://spark-packages.org. 

> RocksDB state storage
> -
>
> Key: SPARK-28120
> URL: https://issues.apache.org/jira/browse/SPARK-28120
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Vikram Agrawal
>Priority: Major
>
> SPARK-13809 introduced a framework for state management for computing 
> Streaming Aggregates. The default implementation was in-memory hashmap which 
> was backed up in HDFS complaint file system at the end of every micro-batch. 
> Current implementation suffers from Performance and Latency Issues. It uses 
> Executor JVM memory to store the states. State store size is limited by the 
> size of the executor memory. Also
> Executor JVM memory is shared by state storage and other tasks operations. 
> State storage size will impact the performance of task execution
> Moreover, GC pauses, executor failures, OOM issues are common when the size 
> of state storage increases which increases overall latency of a micro-batch
> RocksDb is an embedded DB which can provide major performance improvements. 
> Other major streaming frameworks have rocksdb as default state storage.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28022) k8s pod affinity to achieve cloud native friendly autoscaling

2019-10-18 Thread Jiaxin Shan (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-28022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954398#comment-16954398
 ] 

Jiaxin Shan commented on SPARK-28022:
-

I don't quite understand the use case here. Sound like you want to put your 
executor as close as possible. 

Kubernetes has the native support for node affinity and pod affinity but 
they're a little bit different even it make you pods sit close at some level.

 
 # node selector or node affinity 
k8s scheduler put your application of subset of nodes pool. The problem is if 
you have a large pool of certain nodes, it won't do bin pack inside target node 
group. In cloud environment, if you have autoscaler enable, it will guarantee 
resource is utilized.
 # pod affinity.
k8s scheduler will try to find the qualified pod and try to put following pods 
to the same node. 



My question is can both of these address your use case?

> k8s pod affinity to achieve cloud native friendly autoscaling 
> --
>
> Key: SPARK-28022
> URL: https://issues.apache.org/jira/browse/SPARK-28022
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Henry Yu
>Priority: Major
>
> Hi, in order to achieve cloud native friendly autoscaling , I propose to add 
> a pod affinity feature.
> Traditionally, when we use spark in fix size yarn cluster, it make sense to 
> spread containers to every node.
> Coming to cloud native resource manage, we want to release node when we don't 
> need it any more.
> Pod affinity feature counts to place all pods of certain application to some 
> nodes instead of all nodes.
> By the way,  using pod template is not a good choice, adding application id  
> to pod affinity term when submit is more robust.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-29295) Duplicate result when dropping partition of an external table and then overwriting

2019-10-18 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-29295.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 25979
[https://github.com/apache/spark/pull/25979]

> Duplicate result when dropping partition of an external table and then 
> overwriting
> --
>
> Key: SPARK-29295
> URL: https://issues.apache.org/jira/browse/SPARK-29295
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: feiwang
>Assignee: L. C. Hsieh
>Priority: Major
> Fix For: 3.0.0
>
>
> When we drop a partition of a external table and then overwrite it, if we set 
> CONVERT_METASTORE_PARQUET=true(default value), it will overwrite this 
> partition.
> But when we set CONVERT_METASTORE_PARQUET=false, it will give duplicate 
> result.
> Here is a reproduce code below(you can add it into SQLQuerySuite in hive 
> module):
> {code:java}
>   test("spark gives duplicate result when dropping a partition of an external 
> partitioned table" +
> " firstly and they overwrite it") {
> withTable("test") {
>   withTempDir { f =>
> sql("create external table test(id int) partitioned by (name string) 
> stored as " +
>   s"parquet location '${f.getAbsolutePath}'")
> withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> 
> false.toString) {
>   sql("insert overwrite table test partition(name='n1') select 1")
>   sql("ALTER TABLE test DROP PARTITION(name='n1')")
>   sql("insert overwrite table test partition(name='n1') select 2")
>   checkAnswer( sql("select id from test where name = 'n1' order by 
> id"),
> Array(Row(1), Row(2)))
> }
> withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> true.toString) 
> {
>   sql("insert overwrite table test partition(name='n1') select 1")
>   sql("ALTER TABLE test DROP PARTITION(name='n1')")
>   sql("insert overwrite table test partition(name='n1') select 2")
>   checkAnswer( sql("select id from test where name = 'n1' order by 
> id"),
> Array(Row(2)))
> }
>   }
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-29295) Duplicate result when dropping partition of an external table and then overwriting

2019-10-18 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-29295:
---

Assignee: L. C. Hsieh

> Duplicate result when dropping partition of an external table and then 
> overwriting
> --
>
> Key: SPARK-29295
> URL: https://issues.apache.org/jira/browse/SPARK-29295
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: feiwang
>Assignee: L. C. Hsieh
>Priority: Major
>
> When we drop a partition of a external table and then overwrite it, if we set 
> CONVERT_METASTORE_PARQUET=true(default value), it will overwrite this 
> partition.
> But when we set CONVERT_METASTORE_PARQUET=false, it will give duplicate 
> result.
> Here is a reproduce code below(you can add it into SQLQuerySuite in hive 
> module):
> {code:java}
>   test("spark gives duplicate result when dropping a partition of an external 
> partitioned table" +
> " firstly and they overwrite it") {
> withTable("test") {
>   withTempDir { f =>
> sql("create external table test(id int) partitioned by (name string) 
> stored as " +
>   s"parquet location '${f.getAbsolutePath}'")
> withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> 
> false.toString) {
>   sql("insert overwrite table test partition(name='n1') select 1")
>   sql("ALTER TABLE test DROP PARTITION(name='n1')")
>   sql("insert overwrite table test partition(name='n1') select 2")
>   checkAnswer( sql("select id from test where name = 'n1' order by 
> id"),
> Array(Row(1), Row(2)))
> }
> withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> true.toString) 
> {
>   sql("insert overwrite table test partition(name='n1') select 1")
>   sql("ALTER TABLE test DROP PARTITION(name='n1')")
>   sql("insert overwrite table test partition(name='n1') select 2")
>   checkAnswer( sql("select id from test where name = 'n1' order by 
> id"),
> Array(Row(2)))
> }
>   }
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29507) Support ALTER TABLE SET OWNER command

2019-10-18 Thread Kent Yao (Jira)

Kent Yao created SPARK-29507:


 Summary: Support ALTER TABLE SET OWNER command
 Key: SPARK-29507
 URL: https://issues.apache.org/jira/browse/SPARK-29507
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Kent Yao


see https://jira.apache.org/jira/browse/HIVE-18762



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-29405) Alter table / Insert statements should not change a table's ownership

2019-10-18 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-29405:
---

Assignee: Kent Yao

> Alter table / Insert statements should not change a table's ownership
> -
>
> Key: SPARK-29405
> URL: https://issues.apache.org/jira/browse/SPARK-29405
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.4, 2.4.4
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>
> When executing 'insert into/overwrite ...' DML, or 'alter table set 
> tblproperties ...'  DDL, spark would change the ownership of the table the 
> one who runs the spark application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-29405) Alter table / Insert statements should not change a table's ownership

2019-10-18 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-29405.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26068
[https://github.com/apache/spark/pull/26068]

> Alter table / Insert statements should not change a table's ownership
> -
>
> Key: SPARK-29405
> URL: https://issues.apache.org/jira/browse/SPARK-29405
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.4, 2.4.4
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.0.0
>
>
> When executing 'insert into/overwrite ...' DML, or 'alter table set 
> tblproperties ...'  DDL, spark would change the ownership of the table the 
> one who runs the spark application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-29444) Add configuration to support JacksonGenrator to keep fields with null values

2019-10-18 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-29444:
---

Assignee: Jackey Lee

> Add configuration to support JacksonGenrator to keep fields with null values
> 
>
> Key: SPARK-29444
> URL: https://issues.apache.org/jira/browse/SPARK-29444
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Jackey Lee
>Assignee: Jackey Lee
>Priority: Major
> Fix For: 3.0.0
>
>
> DataSet.toJSON will lost some column when field data is null. Maybe it is 
> better to keep null data in some scenarios.
> Such as sparkmagic, which is widely used in jupyter with livy, we use toJSON 
> to toJSON is used to get sql results. with toJSON sparkmagic may return empty 
> results, which confused users.
> Maybe adding a config is the best choice. This configuration retains the 
> current semantics and will remain fields with null until the configuration is 
> modified.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-29444) Add configuration to support JacksonGenrator to keep fields with null values

2019-10-18 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-29444.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26098
[https://github.com/apache/spark/pull/26098]

> Add configuration to support JacksonGenrator to keep fields with null values
> 
>
> Key: SPARK-29444
> URL: https://issues.apache.org/jira/browse/SPARK-29444
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Jackey Lee
>Priority: Major
> Fix For: 3.0.0
>
>
> DataSet.toJSON will lost some column when field data is null. Maybe it is 
> better to keep null data in some scenarios.
> Such as sparkmagic, which is widely used in jupyter with livy, we use toJSON 
> to toJSON is used to get sql results. with toJSON sparkmagic may return empty 
> results, which confused users.
> Maybe adding a config is the best choice. This configuration retains the 
> current semantics and will remain fields with null until the configuration is 
> modified.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-29092) EXPLAIN FORMATTED does not work well with DPP

2019-10-18 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-29092.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26042
[https://github.com/apache/spark/pull/26042]

> EXPLAIN FORMATTED does not work well with DPP
> -
>
> Key: SPARK-29092
> URL: https://issues.apache.org/jira/browse/SPARK-29092
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Assignee: Dilip Biswal
>Priority: Major
> Fix For: 3.0.0
>
>
>  
> {code:java}
> withSQLConf(SQLConf.DYNAMIC_PARTITION_PRUNING_ENABLED.key -> "true",
>   SQLConf.DYNAMIC_PARTITION_PRUNING_REUSE_BROADCAST.key -> "false") {
>   withTable("df1", "df2") {
> spark.range(1000)
>   .select(col("id"), col("id").as("k"))
>   .write
>   .partitionBy("k")
>   .format(tableFormat)
>   .mode("overwrite")
>   .saveAsTable("df1")
> spark.range(100)
>   .select(col("id"), col("id").as("k"))
>   .write
>   .partitionBy("k")
>   .format(tableFormat)
>   .mode("overwrite")
>   .saveAsTable("df2")
> sql("EXPLAIN FORMATTED SELECT df1.id, df2.k FROM df1 JOIN df2 ON df1.k = 
> df2.k AND df2.id < 2")
>   .show(false)
> sql("EXPLAIN EXTENDED SELECT df1.id, df2.k FROM df1 JOIN df2 ON df1.k = 
> df2.k AND df2.id < 2")
>   .show(false)
>   }
> }
> {code}
> The output of EXPLAIN EXTENDED is expected.
> {code:java}
> == Physical Plan ==
> *(2) Project [id#2721L, k#2724L]
> +- *(2) BroadcastHashJoin [k#2722L], [k#2724L], Inner, BuildRight
>:- *(2) ColumnarToRow
>:  +- FileScan parquet default.df1[id#2721L,k#2722L] Batched: true, 
> DataFilters: [], Format: Parquet, Location: 
> PrunedInMemoryFileIndex[file:/Users/lixiao/IdeaProjects/spark/sql/core/spark-warehouse/org.apache...,
>  PartitionFilters: [isnotnull(k#2722L), dynamicpruningexpression(k#2722L IN 
> subquery2741)], PushedFilters: [], ReadSchema: struct
>:+- Subquery subquery2741, [id=#358]
>:   +- *(2) HashAggregate(keys=[k#2724L], functions=[], 
> output=[k#2724L#2740L])
>:  +- Exchange hashpartitioning(k#2724L, 5), true, [id=#354]
>: +- *(1) HashAggregate(keys=[k#2724L], functions=[], 
> output=[k#2724L])
>:+- *(1) Project [k#2724L]
>:   +- *(1) Filter (isnotnull(id#2723L) AND (id#2723L 
> < 2))
>:  +- *(1) ColumnarToRow
>: +- FileScan parquet 
> default.df2[id#2723L,k#2724L] Batched: true, DataFilters: 
> [isnotnull(id#2723L), (id#2723L < 2)], Format: Parquet, Location: 
> PrunedInMemoryFileIndex[file:/Users/lixiao/IdeaProjects/spark/sql/core/spark-warehouse/org.apache...,
>  PartitionFilters: [isnotnull(k#2724L)], PushedFilters: [IsNotNull(id), 
> LessThan(id,2)], ReadSchema: struct
>+- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, 
> true])), [id=#379]
>   +- *(1) Project [k#2724L]
>  +- *(1) Filter (isnotnull(id#2723L) AND (id#2723L < 2))
> +- *(1) ColumnarToRow
>+- FileScan parquet default.df2[id#2723L,k#2724L] Batched: 
> true, DataFilters: [isnotnull(id#2723L), (id#2723L < 2)], Format: Parquet, 
> Location: 
> PrunedInMemoryFileIndex[file:/Users/lixiao/IdeaProjects/spark/sql/core/spark-warehouse/org.apache...,
>  PartitionFilters: [isnotnull(k#2724L)], PushedFilters: [IsNotNull(id), 
> LessThan(id,2)], ReadSchema: struct
> {code}
> However, the output of FileScan node of EXPLAIN FORMATTED does not show the 
> effect of DPP
> {code:java}
> * Project (9)
> +- * BroadcastHashJoin Inner BuildRight (8)
>:- * ColumnarToRow (2)
>:  +- Scan parquet default.df1 (1)
>+- BroadcastExchange (7)
>   +- * Project (6)
>  +- * Filter (5)
> +- * ColumnarToRow (4)
>+- Scan parquet default.df2 (3)
> (1) Scan parquet default.df1 
> Output: [id#2716L, k#2717L]
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-29092) EXPLAIN FORMATTED does not work well with DPP

2019-10-18 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-29092:
---

Assignee: Dilip Biswal

> EXPLAIN FORMATTED does not work well with DPP
> -
>
> Key: SPARK-29092
> URL: https://issues.apache.org/jira/browse/SPARK-29092
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Assignee: Dilip Biswal
>Priority: Major
>
>  
> {code:java}
> withSQLConf(SQLConf.DYNAMIC_PARTITION_PRUNING_ENABLED.key -> "true",
>   SQLConf.DYNAMIC_PARTITION_PRUNING_REUSE_BROADCAST.key -> "false") {
>   withTable("df1", "df2") {
> spark.range(1000)
>   .select(col("id"), col("id").as("k"))
>   .write
>   .partitionBy("k")
>   .format(tableFormat)
>   .mode("overwrite")
>   .saveAsTable("df1")
> spark.range(100)
>   .select(col("id"), col("id").as("k"))
>   .write
>   .partitionBy("k")
>   .format(tableFormat)
>   .mode("overwrite")
>   .saveAsTable("df2")
> sql("EXPLAIN FORMATTED SELECT df1.id, df2.k FROM df1 JOIN df2 ON df1.k = 
> df2.k AND df2.id < 2")
>   .show(false)
> sql("EXPLAIN EXTENDED SELECT df1.id, df2.k FROM df1 JOIN df2 ON df1.k = 
> df2.k AND df2.id < 2")
>   .show(false)
>   }
> }
> {code}
> The output of EXPLAIN EXTENDED is expected.
> {code:java}
> == Physical Plan ==
> *(2) Project [id#2721L, k#2724L]
> +- *(2) BroadcastHashJoin [k#2722L], [k#2724L], Inner, BuildRight
>:- *(2) ColumnarToRow
>:  +- FileScan parquet default.df1[id#2721L,k#2722L] Batched: true, 
> DataFilters: [], Format: Parquet, Location: 
> PrunedInMemoryFileIndex[file:/Users/lixiao/IdeaProjects/spark/sql/core/spark-warehouse/org.apache...,
>  PartitionFilters: [isnotnull(k#2722L), dynamicpruningexpression(k#2722L IN 
> subquery2741)], PushedFilters: [], ReadSchema: struct
>:+- Subquery subquery2741, [id=#358]
>:   +- *(2) HashAggregate(keys=[k#2724L], functions=[], 
> output=[k#2724L#2740L])
>:  +- Exchange hashpartitioning(k#2724L, 5), true, [id=#354]
>: +- *(1) HashAggregate(keys=[k#2724L], functions=[], 
> output=[k#2724L])
>:+- *(1) Project [k#2724L]
>:   +- *(1) Filter (isnotnull(id#2723L) AND (id#2723L 
> < 2))
>:  +- *(1) ColumnarToRow
>: +- FileScan parquet 
> default.df2[id#2723L,k#2724L] Batched: true, DataFilters: 
> [isnotnull(id#2723L), (id#2723L < 2)], Format: Parquet, Location: 
> PrunedInMemoryFileIndex[file:/Users/lixiao/IdeaProjects/spark/sql/core/spark-warehouse/org.apache...,
>  PartitionFilters: [isnotnull(k#2724L)], PushedFilters: [IsNotNull(id), 
> LessThan(id,2)], ReadSchema: struct
>+- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, 
> true])), [id=#379]
>   +- *(1) Project [k#2724L]
>  +- *(1) Filter (isnotnull(id#2723L) AND (id#2723L < 2))
> +- *(1) ColumnarToRow
>+- FileScan parquet default.df2[id#2723L,k#2724L] Batched: 
> true, DataFilters: [isnotnull(id#2723L), (id#2723L < 2)], Format: Parquet, 
> Location: 
> PrunedInMemoryFileIndex[file:/Users/lixiao/IdeaProjects/spark/sql/core/spark-warehouse/org.apache...,
>  PartitionFilters: [isnotnull(k#2724L)], PushedFilters: [IsNotNull(id), 
> LessThan(id,2)], ReadSchema: struct
> {code}
> However, the output of FileScan node of EXPLAIN FORMATTED does not show the 
> effect of DPP
> {code:java}
> * Project (9)
> +- * BroadcastHashJoin Inner BuildRight (8)
>:- * ColumnarToRow (2)
>:  +- Scan parquet default.df1 (1)
>+- BroadcastExchange (7)
>   +- * Project (6)
>  +- * Filter (5)
> +- * ColumnarToRow (4)
>+- Scan parquet default.df2 (3)
> (1) Scan parquet default.df1 
> Output: [id#2716L, k#2717L]
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29505) desc extended is case sensitive

2019-10-18 Thread Shivu Sondur (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954330#comment-16954330
 ] 

Shivu Sondur commented on SPARK-29505:
--

I am checking this issue

> desc extended   is case sensitive
> --
>
> Key: SPARK-29505
> URL: https://issues.apache.org/jira/browse/SPARK-29505
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Major
>
> create table customer(id int, name String, *CName String*, address String, 
> city String, pin int, country String);
> insert into customer values(1,'Alfred','Maria','Obere Str 
> 57','Berlin',12209,'Germany');
> insert into customer values(2,'Ana','trujilo','Adva de la','Maxico 
> D.F.',05021,'Maxico');
> insert into customer values(3,'Antonio','Antonio Moreno','Mataderos 
> 2312','Maxico D.F.',05023,'Maxico');
> analyze table customer compute statistics for columns cname; – *Success( 
> Though cname is not as CName)*
> desc extended customer cname; – Failed
> jdbc:hive2://10.18.19.208:23040/default> desc extended customer *cname;*
> +-+-+
> | info_name | info_value |
> +-+-+
> | col_name | cname |
> | data_type | string |
> | comment | NULL |
> | min | NULL |
> | max | NULL |
> | num_nulls | NULL |
> | distinct_count | NULL |
> | avg_col_len | NULL |
> | max_col_len | NULL |
> | histogram | NULL |
> +-+--
>  
> But 
> desc extended customer CName; – SUCCESS
> 0: jdbc:hive2://10.18.19.208:23040/default> desc extended customer *CName;*
> +-+-+
> | info_name | info_value |
> +-+-+
> | col_name | CName |
> | data_type | string |
> | comment | NULL |
> | min | NULL |
> | max | NULL |
> | num_nulls | 0 |
> | distinct_count | 3 |
> | avg_col_len | 9 |
> | max_col_len | 14 |
> | histogram | NULL |
> +-+-+
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29506) Use dynamicPartitionOverwrite in FileCommitProtocol when insert into hive table

2019-10-18 Thread L. C. Hsieh (Jira)

L. C. Hsieh created SPARK-29506:
---

 Summary: Use dynamicPartitionOverwrite in FileCommitProtocol when 
insert into hive table
 Key: SPARK-29506
 URL: https://issues.apache.org/jira/browse/SPARK-29506
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: L. C. Hsieh
Assignee: L. C. Hsieh


When insert overwrite into hive table, enabling dynamicPartitionOverwrite when 
initializing FileCommitProtocol.

HadoopMapReduceCommitProtocol uses FileOutputCommitter to commit job output 
files.

FileOutputCommitter continues do FileSystem.listStatus for directories in 
partitions, recursively, and commits job output leaf files.

It is inefficient when dynamically overwritting many partitions and files.

HadoopMapReduceCommitProtocol, when dynamicPartitionOverwrite is enabled, 
writes to staging dir dynamically, and commits written partition directories, 
instead of leaf files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29505) desc extended is case sensitive

2019-10-18 Thread ABHISHEK KUMAR GUPTA (Jira)

ABHISHEK KUMAR GUPTA created SPARK-29505:


 Summary: desc extended   is case sensitive
 Key: SPARK-29505
 URL: https://issues.apache.org/jira/browse/SPARK-29505
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: ABHISHEK KUMAR GUPTA


create table customer(id int, name String, *CName String*, address String, city 
String, pin int, country String);
insert into customer values(1,'Alfred','Maria','Obere Str 
57','Berlin',12209,'Germany');
insert into customer values(2,'Ana','trujilo','Adva de la','Maxico 
D.F.',05021,'Maxico');
insert into customer values(3,'Antonio','Antonio Moreno','Mataderos 
2312','Maxico D.F.',05023,'Maxico');

analyze table customer compute statistics for columns cname; – *Success( Though 
cname is not as CName)*

desc extended customer cname; – Failed

jdbc:hive2://10.18.19.208:23040/default> desc extended customer *cname;*
+-+-+
| info_name | info_value |
+-+-+
| col_name | cname |
| data_type | string |
| comment | NULL |
| min | NULL |
| max | NULL |
| num_nulls | NULL |
| distinct_count | NULL |
| avg_col_len | NULL |
| max_col_len | NULL |
| histogram | NULL |
+-+--

 

But 

desc extended customer CName; – SUCCESS

0: jdbc:hive2://10.18.19.208:23040/default> desc extended customer *CName;*
+-+-+
| info_name | info_value |
+-+-+
| col_name | CName |
| data_type | string |
| comment | NULL |
| min | NULL |
| max | NULL |
| num_nulls | 0 |
| distinct_count | 3 |
| avg_col_len | 9 |
| max_col_len | 14 |
| histogram | NULL |
+-+-+

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

69 matches

Mail list logo